Calculate Mean of Rows on Selected Columns in R

In data analysis, calculating the mean of rows on selected columns is a common task, especially when dealing with large datasets with a large number of variables.
In R, you can use the rowMeans() function to calculate the mean of rows on selected columns.
rowMeans(subset(df, select = c(col1, col2)))
The following step-by-step examples will explore how to calculate the mean of rows on selected columns in R
Example 1 (data frame without missing values)
Create a sample data frame,
df <- data.frame(
col1 = rnorm(5, mean = 5),
col2 = rnorm(5, mean = 10),
col2 = rnorm(4, mean = 20)
)
df
col1 col2 col2.1
1 3.888507 9.622785 21.86776
2 3.930475 9.651241 21.42036
3 3.063293 9.869505 20.75059
4 5.144679 11.479637 19.53187
5 6.014065 10.200684 18.16099
Now, calculate the mean of rows on selected columns (col1 and col2),
df$mean <- rowMeans(subset(df, select = c(col1, col2)))
df
col1 col2 col2.1 mean
1 3.888507 9.622785 21.86776 6.755646
2 3.930475 9.651241 21.42036 6.790858
3 3.063293 9.869505 20.75059 6.466399
4 5.144679 11.479637 19.53187 8.312158
5 6.014065 10.200684 18.16099 8.107375
In the above example, we have calculated the mean of rows on columns col1 and col2 using the
rowMeans().
If the data frame contains the missing values (NA), the rowMeans() function will return the
NA as a mean for that row.
Example 2 (data frame with missing values)
If there are missing values in the selected columns, you should add the na.rm = TRUE parameter
to calculate the mean of rows on selected columns.
The following example demonstrates how to calculate the mean of rows on selected columns in a data frame with missing values.
Create a sample data frame with missing values,
df <- data.frame(
col1 = c(25, NA, 30, 35, NA),
col2 = c(170, 165, NA, 180, 175),
col3 = c(70, NA, 65, NA, 80)
)
df
col1 col2 col3
1 25 170 70
2 NA 165 NA
3 30 NA 65
4 35 180 NA
5 NA 175 80
Now, calculate the mean of rows on selected columns (col1 and col2),
df$mean <- rowMeans(subset(df, select = c(col1, col2)), na.rm = TRUE)
df
col1 col2 col3 mean
1 25 170 70 97.5
2 NA 165 NA 165.0
3 30 NA 65 30.0
4 35 180 NA 107.5
5 NA 175 80 175.0
In the above example, we have calculated the mean of rows on columns col1 and col2 (which contains the NA values) using the
rowMeans().