Calculate Confidence Interval for Two-sample Proportions in Python
In Python, you can use the chi2_contingency
function from the scipy package to perform a proportion test
on two groups.
chi2_contingency
function calculates the chi-square test on a contingency table to compare the proportions between the groups.
However, the chi2_contingency
function does not provide built-in methods for reporting confidence intervals for differences in two proportions.
But you can calculate the Wald confidence interval for differences in two Proportions from a 2x2 contingency table.
Wald confidence interval can be calculated using the confint_proportions_2indep
function from the statsmodels package.
The following example explains calculating the Wald confidence interval for two proportions.
Sample dataset
Suppose a marketing survey is completed in two cities (A and B) with 500 individuals for a purchase of the product.
In city A, 300 individuals purchased the product, and in city B, 400 individuals purchased the product.
The number of successes in cities A and B are 300 and 400, respectively.
# import package
import numpy as np
# Create the contingency table
contingency_table = np.array([[300, 200],
[400, 100]])
Proportion test using chi2_contingency
Now, perform the proportion test using the chi2_contingency
function from the scipy package. This function is similar to
prop.test
function in R.
# import package
from scipy.stats import chi2_contingency
# perform the chi-square test for proportion
chi2, p, dof, exp_prop = chi2_contingency(contingency_table)
print(chi2, p, dof, exp_prop)
# output
46.67142857142857 8.394401757688147e-12 1 [[350. 150.]
[350. 150.]]
The Chi-Square Statistic: 46.67; and p value: < 0.05
As the p value (< 0.05) is less than the significance level alpha (0.05), we reject the null hypothesis.
We conclude that there is a significant difference in the proportion of products purchased by individuals in two cities.
As you can see from the output, the chi2_contingency
function does not report the confidence interval.
Calculate confidence interval
We will calculate the Wald confidence interval for the difference in proportions for a 2x2 contingency table. It assumes a normal approximation for proportions.
We will use confint_proportions_2indep
function from statsmodels to calculate the Wald confidence interval. The prop.test
function in R also reports the Wald confidence interval for the proportion test.
# import package
import statsmodels.api as sm
# calculate Wald confidence interval
sm.stats.confint_proportions_2indep(count1=300, nobs1=500,
count2=200, nobs2=500, method='wald')
# output
(0.13927273702968027, 0.2607272629703197)
The 95% confidence interval for differences in proportions is 0.13 to 0.26.
If you want to calculate the confidence interval for one-sample proportion, please read our article on the confidence interval for one-sample proportion.