Two-sample t-test with Unequal Sample Sizes in R

2024-05-08 431 words 3 minutes

Contents

The two-sample t-test with unequal sample sizes can be performed using the built-in t.test() function from in R.

In case of unequal sample sizes, you should check the assumption of the equality of variances (homoscedasticity).

If the variances are not equal you should perform the Welch’s t-test (which does not assume equal variances between the two samples).

You should use Student’s two-sample t-test with unequal sample sizes when variances are equal

t.test(sample1, sample2, var.equal = TRUE)

Tip

By default, if you do not specify the var.equal parameter, t.test() will perform a two-sample t-test assuming non-equal variances between two samples.

You should use Welch’s t-test with unequal sample sizes when variances are not equal

# Welch t-test
t.test(sample1, sample2, var.equal = FALSE)

Welch’s t-test is appropriate for unequal sample sizes when variances are not equal as it adjusts for differences in sample size and variance.

The following examples demonstrate how to perform a two-sample t-test with unequal sample sizes using the built-in t.test() function in R.

Create dataset

Create datasets with unequal sizes for two samples ,

sample1 <- c(28, 35, 45, 65, 44, 56, 40, 42, 35, 34, 44)
sample2 <- c(15, 10, 40, 25, 26, 21)

The sample1 has 11 observations whereas the sample2 has 6 observations.

Check assumption of equality of variances (homoscedasticity)

Before performing the two-sample t-test with unequal sample sizes, you should check the assumption of the equality of variances.

You can use Levene’s test to assess whether the variances of the two samples are equal

# load package
library(car)


leveneTest(c(sample1, sample2), 
	group = factor(rep(c("sample1", "sample2"), c(length(sample1), length(sample2)))))

# output
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1   2e-04 0.9897
      15

As the p value (0.9897) from Levene’s test is greater than significance level (0.05), you would fail to reject the null hypothesis of equal variances i.e. variances are equal between the two samples.

Perform two-sample t-test

As the variances are equal between two unequal sizes samples, you should perform the two-sample t-test using t.test() function (assuming equal variance).

t.test(sample1, sample2, var.equal = TRUE)

# output
	Two Sample t-test

data:  sample1 and sample2
t = 3.715, df = 15, p-value = 0.002074
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  8.402563 31.021680
sample estimates:
mean of x mean of y 
 42.54545  22.83333

As the p value (0.002) from the two-sample t-test is less than a significance level (0.05), you should reject the null hypothesis and conclude that there is a statistically significant difference between the means of the two samples with unequal sizes.