Two-sample t-test with Unequal Sample Sizes in Python

2024-05-08 349 words 2 minutes

Contents

The two-sample t-test with unequal sample sizes can be performed using the ttest_ind() function from the Scipy package in Python.

In case of unequal sample sizes, you should check the assumption of the equality of variances (homoscedasticity).

If the variances are not equal you should perform the Welch’s t-test (which does not assume equal variances between the two samples).

You should use Student’s two-sample t-test with unequal sample sizes when variances are equal

from scipy import stats

stats.ttest_ind(sample1, sample2)

You should use Welch’s t-test with unequal sample sizes when variances are not equal

# Welch’s t-test
from scipy import stats

stats.ttest_ind(sample1, sample2, equal_var=False)

Welch’s t-test is appropriate for unequal sample sizes when variances are not equal as it adjusts for differences in sample size and variance.

The following examples demonstrate how to perform a two-sample t-test with unequal sample sizes in Python.

Create dataset

Create datasets with unequal sizes for two samples ,

sample1 = [24, 28, 32, 29, 35, 36, 30, 32, 25, 31]  
sample2 = [5, 10, 25, 15, 16, 20]

The sample1 has 10 observations whereas the sample2 has 6 observations.

Check assumption of equality of variances

Before performing the two-sample t-test with unequal sample sizes, you should check the assumption of the equality of variances.

You can use Levene’s test to assess whether the variances of the two samples are equal

# import package
from scipy.stats import levene

levene(sample1, sample1)

# output
LeveneResult(statistic=0.0, pvalue=1.0)

As the p value (1.0) from Levene’s test is greater than significance level (0.05), you would fail to reject the null hypothesis of equal variances i.e. variances are equal between the two samples.

Perform two-sample t-test

As the variances are equal between two unequal sizes samples, you should perform the two-sample t-test using ttest_ind() function (assuming equal variance).

# import package
from scipy import stats

stats.ttest_ind(sample1, sample2)

# output
Ttest_indResult(statistic=5.5411211843802795, pvalue=7.269211172631758e-05)

As the p value from the two-sample t-test is less than a significance level (0.05), you should reject the null hypothesis and conclude that there is a statistically significant difference between the means of the two samples with unequal sizes.