Calculate Confidence Interval for Two-sample Proportions in Python

2024-06-12 403 words 2 minutes

Contents

In Python, you can use the chi2_contingency function from the scipy package to perform a proportion test on two groups.

chi2_contingency function calculates the chi-square test on a contingency table to compare the proportions between the groups.

However, the chi2_contingency function does not provide built-in methods for reporting confidence intervals for differences in two proportions.

But you can calculate the Wald confidence interval for differences in two Proportions from a 2x2 contingency table.

Wald confidence interval can be calculated using the confint_proportions_2indep function from the statsmodels package.

The following example explains calculating the Wald confidence interval for two proportions.

Sample dataset

Suppose a marketing survey is completed in two cities (A and B) with 500 individuals for a purchase of the product.

In city A, 300 individuals purchased the product, and in city B, 400 individuals purchased the product.

The number of successes in cities A and B are 300 and 400, respectively.

# import package
import numpy as np

# Create the contingency table
contingency_table = np.array([[300, 200],
                 [400, 100]])

Proportion test using `chi2_contingency`

Now, perform the proportion test using the chi2_contingency function from the scipy package. This function is similar to prop.test function in R.

# import package
from scipy.stats import chi2_contingency

# perform the chi-square test for proportion
chi2, p, dof, exp_prop = chi2_contingency(contingency_table)

print(chi2, p, dof, exp_prop)

# output
46.67142857142857 8.394401757688147e-12 1 [[350. 150.]
 [350. 150.]]

The Chi-Square Statistic: 46.67; and p value: < 0.05

As the p value (< 0.05) is less than the significance level alpha (0.05), we reject the null hypothesis.

We conclude that there is a significant difference in the proportion of products purchased by individuals in two cities.

As you can see from the output, the chi2_contingency function does not report the confidence interval.