Calculate Confidence Interval for t-test in Python

2023-12-28 442 words 3 minutes

Contents

A confidence interval provides an estimated range of values which is likely to include the unknown parameter (such as mean) of a population when you draw samples many times from the population.

For example, if we take 100 random samples, and calculate the 95% confidence interval on each of these samples, then 95 of the 100 samples are likely to contain the population mean.

The 95% confidence interval indicates that we are 95% confident that the true population parameter will fall within the given confidence interval.

Suppose the true mean of the population is 5 and you repeated the study many times with samples drawn from the population. In that case, we are 95% confident that the true mean (5) will fall within the calculated 95% confidence interval.

In particular, confidence intervals are useful to interpret results when the p value from the statistical test is very close to significance (e.g. 0.05).

The following examples demonstrate how to calculate the 95% confidence interval for the t-test in Python.

1 Confidence interval for a one-sample t-test

We will use the ttest function from the pingouin package

# import package
from pingouin import ttest

# sample data
x = [6, 5, 10, 7, 8, 9, 10, 11, 15]

# perform one sample t-test
ttest(x, 10)

# output
          T  dof alternative     p-val          CI95%   cohen-d   BF10     power
T-test -1.0    8   two-sided  0.346594  [6.69, 11.31]  0.333333  0.482  0.143256

As the p value is non-significant (> 0.05), we fail to reject the null hypothesis and suggest that the sample mean is not different from the population mean 10.

The 95% confidence interval for the one-sample t-test is (6.69, 11.31). This indicates that we are 95% confident that the true mean of the population will lie between 6.69 and 11.31

2 Confidence interval for two-sample t-test

We will perform the independent two-sample t-test to calculate the 95% confidence interval

# import package
from pingouin import ttest
import numpy as np

# sample data
group1 = np.random.normal(loc=8, size=20)
group2 = np.random.normal(loc=5, size=20)

# perform one sample t-test
ttest(group1, group2)

# output
               T  dof alternative         p-val         CI95%   cohen-d      BF10  power
T-test  7.751392   38   two-sided  2.423817e-09  [1.82, 3.11]  2.451205  3.14e+06    1.0

As the p value is significant (< 0.05), we reject the null hypothesis and suggest that the means of the two groups are significantly different.

The 95% confidence interval for the difference between the means of two groups is (1.82, 3.11). This indicates that we are 95% confident that the difference in mean of the two population groups will lie between 1.82 and 3.11.

Additionally, you can also use the scipy library to calculate the confidence interval based on the t-distribution.