In Python, the inverse of the Cumulative Distribution Function (CDF) is calculated using the ppf (percent point function) from the SciPy package.
The inverse of CDF is mostly used for finding the Z-scores corresponding to a given cumulative probability (area under the normal curve to the left of the Z-score).
The inverse of CDF is also useful in calculating the critical Z-scores for confidence interval calculation (e.g. 95% confidence interval)
Z-score is a standardized score which measures how many standard deviations a data point is from the mean of a given distribution.
Z-score can be 0 (data point is similar to the mean), positive (data point is greater than the mean), and negative (data point is lesser than the mean).
Z-score can be used for calculating the confidence level to find the range of values that is likely to contain the true mean of a population.
t-test compares the means of two groups (two-sample t-test) to determine whether they are significantly different from one another.
In Python, the NumPy library does not have a built-in function to perform the two-sample t-test. But you can calculate a two-sample t-test manually using the NumPy.
Two-sample t-test formula:
Where, x̄1 and x̄2 are sample means, s1 and s2 are standard deviations of the samples, and n1 and n2 are the sample sizes.
You can use the boxplot function from seaborn python package to plot the boxplot.
You can show the mean on the seaborn boxplot using the showmeans=True parameter. By default, it shows the mean by a green triangle marker.
You can adjust the shape, size, and color of the mean marker on seaborn boxplot using the meanprops parameter.
The following example explains how to customize the shape and color of the mean marker on seaborn boxplot.
In pandas, you can use the hist() function for plotting the histogram.
When you plot multiple histograms in a single plot, you may notice that the default spacing between multiple histograms can make your histograms appear cluttered.
You can use figsize and tight_layout functions from the matplotlib to adjust the space between histograms and avoid cluttering.
The following example explains how to adjust the spacing between pandas histograms.
Create a sample DataFrame,
In pandas, you can use the hist() function for plotting the histogram.
Sometimes, it could be tricky to add the global title at the top of the collection of histograms when you plot them in a single plot.
You can use the suptitle() function from matplotlib to add the centered global title to the collection of pandas histogram.
The following example explains how to plot multiple histograms using pandas hist() function and add a global title at the top of these histograms using the suptitle() function.