Two-sample Kolmogorov-Smirnov (KS) Test in Python

2024-05-08 196 words One minute

Contents

In Python, the ks_2samp() function (from scipy) can be used for performing the two-sample Kolmogorov-Smirnov (KS) test.

The Kolmogorov-Smirnov (KS) test is a nonparametric test that assesses if two samples come from the same distribution.

The basic syntax for ks_2samp() function is:

from scipy import stats

stats.ks_2samp(sample1, sample2)

The following examples demonstrate how to perform a Kolmogorov-Smirnov test in python.

Create dataset

Create datasets with normal distributions for two samples ,

# import package
import numpy as np

sample1 = np.random.normal(loc=0, scale=1, size=500)
sample2 = np.random.normal(loc=5, scale=2, size=500)

You have created two datasets with normal distributions. Sample1 has a mean of 0 and a standard deviation of 1. Sample2 has a mean of 5 and a standard deviation of 2.

Perform a two-sample Kolmogorov-Smirnov (KS) test

You can perform the two-sample Kolmogorov-Smirnov (KS) test using the ks_2samp() function from scipy.

# import package
from scipy import stats

stats.ks_2samp(sample1, sample2)

# output
KstestResult(statistic=0.932, pvalue=1.4211237605112785e-236)

ks_2samp() function output KS statistic (D) and the p value.

As the p value from the two-sample Kolmogorov-Smirnov test is less than a significance level (0.05), you should reject the null hypothesis and conclude that the two samples likely come from different distributions.