Calculate 95% Confidence Interval Using dplyr
The calculation of confidence interval is useful to understand the range within which the true parameter value lies with a certain level of confidence (e.g. 95% confidence interval).
In this article, we will discuss how to calculate the 95% confidence intervals for grouped data using the dplyr package in R.
Example 1
Load the built-in mtcars
data. This dataset contains the 11 variables for various observations of car models.
data('mtcars')
# view data frame
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
We will calculate the 95% confidence interval for mtcars
data using the mpg (miles per gallon) variable based on cyl
grouping variable.
We will use the functions from dplyr
to group the data by cyl
categorical variable and calculate the mean. The 95% confidence interval
will be calculated using the ci
function from the gmodels
package.
By default, the ci
function calculates the 95% confidence interval.
# load packages
# install.packages("tidyverse")
# install.packages("gmodels")
library(dplyr)
library(gmodels)
results <- mtcars %>%
group_by(cyl) %>%
summarise(
mean_mpg = mean(mpg),
ci_lower = ci(mpg)[2],
ci_upper = ci(mpg)[3])
results
# A tibble: 3 × 4
cyl mean_mpg ci_lower ci_upper
<dbl> <dbl> <dbl> <dbl>
1 4 26.7 23.6 29.7
2 6 19.7 18.4 21.1
3 8 15.1 13.6 16.6
The results table contains the mean and 95% confidence intervals (lower and upper bounds) for mtcars
data based on the cyl
grouping variable.
Example 2
You can also manually calculate the 95% confidence interval instead of using the ci
function from the gmodels
package.
Load the built-in mtcars
data,
data('mtcars')
# view data frame
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
You can use the built-in qt
function to calculate the two-tailed critical value for calculating the 95% confidence interval.
# load packages
# install.packages("tidyverse")
library(dplyr)
results <- mtcars %>%
group_by(cyl) %>%
summarise(
mean_mpg = mean(mpg),
ci_lower = mean(mpg) - qt(0.975, df = n() - 1) * sd(mpg) / sqrt(n()),
ci_upper = mean(mpg) + qt(0.975, df = n() - 1) * sd(mpg) / sqrt(n()))
results
# A tibble: 3 × 4
cyl mean_mpg ci_lower ci_upper
<dbl> <dbl> <dbl> <dbl>
1 4 26.7 23.6 29.7
2 6 19.7 18.4 21.1
3 8 15.1 13.6 16.6
The results table contains the mean and 95% confidence intervals (lower and upper bounds) for the mtcars
data based on the cyl
grouping variable.