Contents

Calculate Confidence Interval for Two-sample Proportions in Python

In Python, you can use the chi2_contingency function from the scipy package to perform a proportion test on two groups.

chi2_contingency function calculates the chi-square test on a contingency table to compare the proportions between the groups.

However, the chi2_contingency function does not provide built-in methods for reporting confidence intervals for differences in two proportions.

But you can calculate the Wald confidence interval for differences in two Proportions from a 2x2 contingency table.

Wald confidence interval can be calculated using the confint_proportions_2indep function from the statsmodels package.

The following example explains calculating the Wald confidence interval for two proportions.

Sample dataset

Suppose a marketing survey is completed in two cities (A and B) with 500 individuals for a purchase of the product.

In city A, 300 individuals purchased the product, and in city B, 400 individuals purchased the product.

The number of successes in cities A and B are 300 and 400, respectively.

# import package
import numpy as np

# Create the contingency table
contingency_table = np.array([[300, 200],
                 [400, 100]])

Proportion test using chi2_contingency

Now, perform the proportion test using the chi2_contingency function from the scipy package. This function is similar to prop.test function in R.

# import package
from scipy.stats import chi2_contingency

# perform the chi-square test for proportion
chi2, p, dof, exp_prop = chi2_contingency(contingency_table)

print(chi2, p, dof, exp_prop)

# output
46.67142857142857 8.394401757688147e-12 1 [[350. 150.]
 [350. 150.]]

The Chi-Square Statistic: 46.67; and p value: < 0.05

As the p value (< 0.05) is less than the significance level alpha (0.05), we reject the null hypothesis.

We conclude that there is a significant difference in the proportion of products purchased by individuals in two cities.

As you can see from the output, the chi2_contingency function does not report the confidence interval.

Calculate confidence interval

We will calculate the Wald confidence interval for the difference in proportions for a 2x2 contingency table. It assumes a normal approximation for proportions.

We will use confint_proportions_2indep function from statsmodels to calculate the Wald confidence interval. The prop.test function in R also reports the Wald confidence interval for the proportion test.

# import package
import statsmodels.api as sm

# calculate Wald confidence interval
sm.stats.confint_proportions_2indep(count1=300, nobs1=500, 
	count2=200, nobs2=500, method='wald')

# output
(0.13927273702968027, 0.2607272629703197)

The 95% confidence interval for differences in proportions is 0.13 to 0.26.

If you want to calculate the confidence interval for one-sample proportion, please read our article on the confidence interval for one-sample proportion.