Calculate Confidence Interval with SciPy

2024-08-21 357 words 2 minutes

Contents

A confidence interval provides an estimated range of interval which is likely to include the unknown parameter (such as mean) of a population when you draw samples many times from the population.

In Python, you can use the interval function from SciPy to calculate various confidence intervals based on Student’s t-distribution and Z-distributions (standard normal distribution).

The following examples explain calculating confidence Intervals using the scipy library.

Calculate 95% confidence interval based on t-distribution

Create a sample dataset,

# import package
import numpy as np
from scipy import stats

x = np.array([10, 14, 9, 10, 20, 16, 19, 21, 22, 13, 15, 22, 20, 11, 12])

Calculate mean, standard error, and degree of freedoms,

# calculate mean 
mean = np.mean(x)

# standard error
std_err = stats.sem(x)

# degree of freedoms
degree_f = len(x)-1

Calculate the 95% confidence interval using interval function from scipy,

# calculate 95% confidence interval 
stats.t.interval(alpha=0.95, df=len(x)-1, loc=mean, scale=std_err)

# output

(12.987398435544499, 18.2126015644555)

The 95% confidence interval for the data is 12.98 to 18.21.

The interval function from scipy.stats.t calculates the confidence interval based on t-distribution. By default, it calculates a two-tailed confidence interval.

The t-distribution should be used when sample size small (n<30).

Similarly, you can also calculate the other required confidence interval. For example, calculate 90% confidence interval,

# calculate 95% confidence interval 
stats.t.interval(alpha=0.90, df=len(x)-1, loc=mean, scale=std_err)

# output

(13.454517821217028, 17.74548217878297)

The 90% confidence interval for the data is 13.45 to 17.74.

Calculate 95% confidence interval based on Z-distribution

Create a sample dataset,

# import package
import numpy as np
from scipy import stats

x = np.array([10, 14, 9, 10, 20, 16, 19, 21, 22, 13, 15, 22, 20, 11, 12])

Calculate mean and standard error,

# calculate mean 
mean = np.mean(x)

# standard error
std_err = stats.sem(x)

Calculate the 95% confidence interval using norm.interval function,

# calculate 95% confidence interval 
stats.norm.interval(alpha=0.95, loc=mean, scale=std_err)

# output

(13.212534150303288, 17.98746584969671)

The 95% confidence interval based on Z-distribution (standard normal distribution) is 12.98 to 18.21.

The interval function from scipy.stats.norm calculates the confidence interval based on Z-distribution. By default, it calculates a two-tailed confidence interval.

The Z-distribution typically used when sample size is large (n>=30).