Calculate Confidence Interval with SciPy
A confidence interval provides an estimated range of interval which is likely to include the unknown parameter (such as mean) of a population when you draw samples many times from the population.
In Python, you can use the interval
function from SciPy to calculate various confidence intervals based on Student’s t-distribution and Z-distributions (standard normal distribution).
The following examples explain calculating confidence Intervals using the scipy library.
Calculate 95% confidence interval based on t-distribution
Create a sample dataset,
# import package
import numpy as np
from scipy import stats
x = np.array([10, 14, 9, 10, 20, 16, 19, 21, 22, 13, 15, 22, 20, 11, 12])
Calculate mean, standard error, and degree of freedoms,
# calculate mean
mean = np.mean(x)
# standard error
std_err = stats.sem(x)
# degree of freedoms
degree_f = len(x)-1
Calculate the 95% confidence interval using interval
function from scipy,
# calculate 95% confidence interval
stats.t.interval(alpha=0.95, df=len(x)-1, loc=mean, scale=std_err)
# output
(12.987398435544499, 18.2126015644555)
The 95% confidence interval for the data is 12.98 to 18.21.
The interval
function from scipy.stats.t
calculates the confidence interval based on t-distribution. By default, it calculates a two-tailed confidence interval.
The t-distribution should be used when sample size small (n<30).
Similarly, you can also calculate the other required confidence interval. For example, calculate 90% confidence interval,
# calculate 95% confidence interval
stats.t.interval(alpha=0.90, df=len(x)-1, loc=mean, scale=std_err)
# output
(13.454517821217028, 17.74548217878297)
The 90% confidence interval for the data is 13.45 to 17.74.
Calculate 95% confidence interval based on Z-distribution
Create a sample dataset,
# import package
import numpy as np
from scipy import stats
x = np.array([10, 14, 9, 10, 20, 16, 19, 21, 22, 13, 15, 22, 20, 11, 12])
Calculate mean and standard error,
# calculate mean
mean = np.mean(x)
# standard error
std_err = stats.sem(x)
Calculate the 95% confidence interval using norm.interval
function,
# calculate 95% confidence interval
stats.norm.interval(alpha=0.95, loc=mean, scale=std_err)
# output
(13.212534150303288, 17.98746584969671)
The 95% confidence interval based on Z-distribution (standard normal distribution) is 12.98 to 18.21.
The interval
function from scipy.stats.norm
calculates the confidence interval based on Z-distribution. By default, it calculates a two-tailed confidence interval.
The Z-distribution typically used when sample size is large (n>=30).