Calculate t-test with NumPy

2024-09-26 356 words 2 minutes

Contents

t-test compares the means of two groups (two-sample t-test) to determine whether they are significantly different from one another.

In Python, the NumPy library does not have a built-in function to perform the two-sample t-test. But you can calculate a two-sample t-test manually using the NumPy.

Two-sample t-test formula:

$https://latex.codecogs.com/svg.image?\huge&space;&space;t=\frac{\bar{x_1}-\bar{x_2}}{\sqrt{\frac{s_1^{2}}{n_1}+\frac{s_2^{2}}{n_2}}}$

Where, x̄1 and x̄2 are sample means, s1 and s2 are standard deviations of the samples, and n1 and n2 are the sample sizes.

The following example explains how to perform a two-sample t-test using NumPy in Python.

Sample dataset

Create sample datasets for two groups,

# import packages
import numpy as np

# create sample datasets
group1 = [10,12, 9, 12, 14, 11, 16, 14, 13, 10]
group2 = [20,22, 29, 24, 24, 21, 18, 26, 29, 30]

Calculate mean for samples

Calculate the mean values for group1 and group2,

# import packages
import numpy as np

# calculate mean
group1_mean = np.mean(group1)
group2_mean = np.mean(group2)

Calculate sample variances

Calculate the variances for group1 and group2,

# import packages
import numpy as np

# calculate variances
group1_var = np.var(group1, ddof=1)  
group2_var = np.var(group2, ddof=1)

We calculated the sample variance using the degree of freedom (n-1). The ddof represents the degree of freedom which is 0 by default in numpy.

As we have a sample from a larger population, the use of ddof=1 gives the unbiased estimator of the variance of the population.

Calculate t-value and p value

Calculate the two-sample t-test formula

# calculate variances
tval =  (group1_mean - group2_mean) / np.sqrt(group1_var/len(group1) + group2_var/len(group2))

tval

# output
-8.246088

The t-value for the two-sample t-test is -8.246088. We can use this t-value to calculate the p value.

We will use the sf function from scipy.stats.t module to calculate the two-sided and one-sided p value.

# import packages
from scipy.stats import t
import numpy as np

# two sided p value
t.sf(x=np.abs(tval), df=len(group1)-1)*2
# output
1.7362138464589428e-05

# one-sided p value
t.sf(x=np.abs(tval), df=len(group1)-1)
# output
8.681069232294714e-06

The p value for the two-sample t-test is statistically significant (p < 0.05). We reject the null hypothesis and conclude that the mean of the two groups are significantly different from one another.