How to Show Mean on Boxplot Using Matplotlib
Boxplots are a great way to visualize the distribution (min, max, quartiles, and median) of a dataset. However, they do not display the mean of the dataset by default.
In Python matplotlib, you can use the showmeans
parameter from the boxplot
function to show the mean on the boxplot.
The following example explains how to plot a boxplot and show the mean on it using the matplotlib Python package.
Let’s create a random dataset for three groups using the rand
function from NumPy,
# import packages
import numpy as np
data = np.random.rand(50, 3)
This dataset contains 50 observations with different means for three groups.
Create a boxplot using the boxplot
function and show the mean value on the boxplot by passing the showmeans=True
parameter.
# import packages
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.boxplot(data, showmeans=True)
plt.xlabel("groups")
plt.ylabel("values")
plt.show()
In the above boxplot, the green triangle at the boxplot represents the mean of the dataset. The default orange lines represent the median of the dataset.
You can also add a line (similar to the median) for the mean on the boxplot instead of plotting triangle shapes.
# import packages
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.boxplot(data, showmeans=True, meanline=True)
plt.xlabel("groups")
plt.ylabel("values")
plt.show()