Two-sample Proportion Test (Z test) in Python

2024-06-18 356 words 2 minutes

Contents

Two-sample proportion test is used for comparing the proportions (e.g. number of successes) in two groups to determine if there are significant differences between the two groups.

In Python, you can use proportions_ztest function from statsmodels package to perform a two-sample proportion Z test.

The basic syntax for proportions_ztest for two-proportion Z test:

# import package
import statsmodels.api as sm
stat, pval = sm.stats.proportions_ztest(count, nobs)

Where, count is an array of the number of successes for each group and nobs is an array of the number of observations for each group.

The following examples demonstrate how to use two-sample proportion Z test to compare the proportions in two different groups.

Sample dataset

Suppose a marketing survey is completed in two cities (A and B) with 500 individuals for a purchase of the product. In city A, 300 individuals purchased the product and in city B, 400 individuals purchased the product.

The number of successes in cities A and B are 300 and 400, respectively.

# import package
import numpy as np

# number of successes in each city
count = np.array([300, 400])

# total number of observations in each city
nobs = np.array([500, 500])

Hypothesis

We will test the following Null and Alternative hypothesis.

Null Hypothesis (H0): No difference in the proportions of individuals who purchased the product of the two cities (two-tailed test).

Alternative Hypothesis (Ha): There is a difference between the proportions of individuals who purchased the product of the two cities.

Two-sample proportion Z test

Now, perform the two-proportion Z test using proportions_ztest function from statsmodels package.

# import package
import statsmodels.api as sm

# perform Two-proportion Z test package
stat, pval = sm.stats.proportions_ztest(count, nobs)

print(stat, pval)

# output
(-6.900655593423544, 5.176309056990089e-12)

The Z Statistic: -6.90; and p value: < 0.05

As the p value (< 0.05) is less than the significance alpha level (0.05), we reject the null hypothesis.

We conclude that there is a significant difference in the proportion of product purchased by individuals in two cities.

In addition to Z test, you can also perform the two-sample proportion test using the chi-squared test which is more similar to the R prop.test.