Calculate Binomial Confidence Intervals in Python

2024-07-14 397 words 2 minutes

Contents

The binomial distribution is commonly used in statistics for modeling binary outcomes, such as success/failure, yes/no, etc.

The binomial distribution is a discrete probability distribution and based on two main parameters: the number of trials (n) and the probability of success (p).

Binomial confidence intervals estimate the range of values where certain outcome proportion such as success rate (true population probability of successes) will likely fall.

In Python, binomial confidence intervals can be calculated using the proportion_confint() function from the statsmodels package.

The basic syntax for the proportion_confint() function for calculating the binomial confidence interval is:

# import packages
import statsmodels.api as sm

sm.stats.proportion_confint(count=500, nobs=800, method='wald')

The above function calculates the binomial confidence interval for a binomial test.

The following example explains in detail how to calculate the binomial confidence interval.

Example

Suppose, a company performed a market survey on 500 people in a city and found that 300 individuals purchased the product.

In this example, you can calculate the 95% binomial confidence interval using the normal approximation method (Wald confidence interval).

# import packages
import statsmodels.api as sm

# number of observations
n = 100  
# proportion of success (300/500)
p = 0.6  
# 95% confidence level
alpha = 0.05  

# calculate 95% binomial confidence interval
sm.stats.proportion_confint(count=300, nobs=500, alpha=0.05, 
	method='normal')

# output
(0.5570593405507882, 0.6429406594492117)

The 95% binomial confidence interval is 0.55 to 0.64. It means that the true population probability of success will likely fall between 0.55 to 0.64 if you performed the survey multiple times.

Note

The normal approximation method for binomial confidence interval could be inaccurate for small samples or extreme proportions (such as close to 0 or 1). This method is suitable when you have a large sample size.

If you have small sample sizes, you can calculate the binomial confidence interval using Wilson’s score interval.

In this example code, we will calculate the 95% binomial confidence interval using Wilson’s score interval method.

# import packages
import statsmodels.api as sm

# number of observations
n = 100  
# proportion of success (300/500)
p = 0.6  
# 95% confidence level
alpha = 0.05  

# calculate 95% binomial confidence interval
sm.stats.proportion_confint(count=300, nobs=500, alpha=0.05, 
	method='wilson')

# output
(0.5564541227024271, 0.6420210092052625)

The 95% binomial confidence interval using Wilson’s score interval method is 0.55 to 0.64. It means that the true population probability of success will likely fall between 0.55 to 0.64 if you performed the survey multiple times.