Calculate t-test with NumPy
t-test compares the means of two groups (two-sample t-test) to determine whether they are significantly different from one another.
In Python, the NumPy library does not have a built-in function to perform the two-sample t-test. But you can calculate a two-sample t-test manually using the NumPy.
Two-sample t-test formula:
Where, x̄1 and x̄2 are sample means, s1 and s2 are standard deviations of the samples, and n1 and n2 are the sample sizes.
The following example explains how to perform a two-sample t-test using NumPy in Python.
Sample dataset
Create sample datasets for two groups,
# import packages
import numpy as np
# create sample datasets
group1 = [10,12, 9, 12, 14, 11, 16, 14, 13, 10]
group2 = [20,22, 29, 24, 24, 21, 18, 26, 29, 30]
Calculate mean for samples
Calculate the mean values for group1 and group2,
# import packages
import numpy as np
# calculate mean
group1_mean = np.mean(group1)
group2_mean = np.mean(group2)
Calculate sample variances
Calculate the variances for group1 and group2,
# import packages
import numpy as np
# calculate variances
group1_var = np.var(group1, ddof=1)
group2_var = np.var(group2, ddof=1)
We calculated the sample variance using the degree of freedom (n-1
). The ddof
represents the degree of freedom
which is 0 by default in numpy.
As we have a sample from a larger population, the use of ddof=1
gives the unbiased estimator of the variance of the population.
Calculate t-value and p value
Calculate the two-sample t-test formula
# calculate variances
tval = (group1_mean - group2_mean) / np.sqrt(group1_var/len(group1) + group2_var/len(group2))
tval
# output
-8.246088
The t-value for the two-sample t-test is -8.246088. We can use this t-value to calculate the p value.
We will use the sf
function from scipy.stats.t
module to calculate the two-sided and one-sided p value.
# import packages
from scipy.stats import t
import numpy as np
# two sided p value
t.sf(x=np.abs(tval), df=len(group1)-1)*2
# output
1.7362138464589428e-05
# one-sided p value
t.sf(x=np.abs(tval), df=len(group1)-1)
# output
8.681069232294714e-06
The p value for the two-sample t-test is statistically significant (p < 0.05). We reject the null hypothesis and conclude that the mean of the two groups are significantly different from one another.