ppf vs cdf in SciPy

2024-08-21 324 words 2 minutes

Contents

ppf (Percent Point Function) and cdf (Cumulative Distribution Function) are probability distribution functions available in the scipy library in Python.

The cdf function is used for getting a probability (p) value from a specific value, whereas the ppf function is used for getting a specific value from the probability (p) value.

The ppf function is the inverse of the cdf function.

The following examples explain the differences in cdf and ppf functions and how to calculate them.

cdf (Cumulative Distribution Function)

The cdf represents the p value that a random value (X) from the given distribution will be less than or equal to a specific value (x).

Mathematically, it can be written as P(X<=x)

Suppose you have normally distributed data with a mean of 100 and a standard deviation of 20.

Here, you can use the cdf function to calculate the p value that a random value is less than or equal to 60.

# load packages
from scipy.stats import norm

norm.cdf(x=60, loc=100, scale=20)

# output
0.022750131948179195

The probability that the random value (X) from a normal distribution is less than or equal to 60 is 0.02275.

In summary, the cdf function gives the probability that a random value (X) drawn from the distribution will be less than or equal to x.

ppf (Percent Point Function)

ppf is the inverse of the cdf and gives the specific value (x) when the p value is given.

ppf gives the the specific value (x) such that the cumulative probability up to x is equal to p.

Suppose you have normally distributed data with a mean of 100 and a standard deviation of 20.

Calculate the value of x when the p value is 0.02275 (as obtained in the above example for cdf).

# load packages
from scipy.stats import norm

norm.ppf(q=0.02275, loc=100, scale=20)

# output
59.99995112200

The value of x is ~60. This indicates that, at a p value of 0.02275, the value of x is 60.