ppf vs cdf in SciPy
ppf
(Percent Point Function) and cdf
(Cumulative Distribution Function) are probability distribution functions available in the scipy library in Python.
The cdf
function is used for getting a probability (p) value from a specific value, whereas the ppf
function is used for getting a specific value from the probability (p) value.
The ppf
function is the inverse of the cdf
function.
The following examples explain the differences in cdf
and ppf
functions and how to calculate them.
cdf (Cumulative Distribution Function)
The cdf represents the p value that a random value (X) from the given distribution will be less than or equal to a specific value (x).
Mathematically, it can be written as P(X<=x)
Suppose you have normally distributed data with a mean of 100 and a standard deviation of 20.
Here, you can use the cdf
function to calculate the p value that a random value is less than or equal to 60.
# load packages
from scipy.stats import norm
norm.cdf(x=60, loc=100, scale=20)
# output
0.022750131948179195
The probability that the random value (X) from a normal distribution is less than or equal to 60 is 0.02275.
In summary, the cdf
function gives the probability that a random value (X) drawn from the distribution will be less than or equal to
x.
ppf (Percent Point Function)
ppf is the inverse of the cdf and gives the specific value (x) when the p value is given.
ppf gives the the specific value (x) such that the cumulative probability up to x is equal to p.
Suppose you have normally distributed data with a mean of 100 and a standard deviation of 20.
Calculate the value of x when the p value is 0.02275 (as obtained in the above example for cdf).
# load packages
from scipy.stats import norm
norm.ppf(q=0.02275, loc=100, scale=20)
# output
59.99995112200
The value of x is ~60. This indicates that, at a p value of 0.02275, the value of x is 60.