Calculate Mean of Rows on Selected Columns in R
In data analysis, calculating the mean of rows on selected columns is a common task, especially when dealing with large datasets with a large number of variables.
In R, you can use the rowMeans()
function to calculate the mean of rows on selected columns.
rowMeans(subset(df, select = c(col1, col2)))
The following step-by-step examples will explore how to calculate the mean of rows on selected columns in R
Example 1 (data frame without missing values)
Create a sample data frame,
df <- data.frame(
col1 = rnorm(5, mean = 5),
col2 = rnorm(5, mean = 10),
col2 = rnorm(4, mean = 20)
)
df
col1 col2 col2.1
1 3.888507 9.622785 21.86776
2 3.930475 9.651241 21.42036
3 3.063293 9.869505 20.75059
4 5.144679 11.479637 19.53187
5 6.014065 10.200684 18.16099
Now, calculate the mean of rows on selected columns (col1
and col2
),
df$mean <- rowMeans(subset(df, select = c(col1, col2)))
df
col1 col2 col2.1 mean
1 3.888507 9.622785 21.86776 6.755646
2 3.930475 9.651241 21.42036 6.790858
3 3.063293 9.869505 20.75059 6.466399
4 5.144679 11.479637 19.53187 8.312158
5 6.014065 10.200684 18.16099 8.107375
In the above example, we have calculated the mean of rows on columns col1
and col2
using the
rowMeans()
.
If the data frame contains the missing values (NA
), the rowMeans()
function will return the
NA
as a mean for that row.
Example 2 (data frame with missing values)
If there are missing values in the selected columns, you should add the na.rm = TRUE
parameter
to calculate the mean of rows on selected columns.
The following example demonstrates how to calculate the mean of rows on selected columns in a data frame with missing values.
Create a sample data frame with missing values,
df <- data.frame(
col1 = c(25, NA, 30, 35, NA),
col2 = c(170, 165, NA, 180, 175),
col3 = c(70, NA, 65, NA, 80)
)
df
col1 col2 col3
1 25 170 70
2 NA 165 NA
3 30 NA 65
4 35 180 NA
5 NA 175 80
Now, calculate the mean of rows on selected columns (col1
and col2
),
df$mean <- rowMeans(subset(df, select = c(col1, col2)), na.rm = TRUE)
df
col1 col2 col3 mean
1 25 170 70 97.5
2 NA 165 NA 165.0
3 30 NA 65 30.0
4 35 180 NA 107.5
5 NA 175 80 175.0
In the above example, we have calculated the mean of rows on columns col1
and col2
(which contains the NA
values) using the
rowMeans()
.