Calculate Mean of Rows on Selected Columns in R

2024-04-02 402 words 2 minutes

Contents

In data analysis, calculating the mean of rows on selected columns is a common task, especially when dealing with large datasets with a large number of variables.

In R, you can use the rowMeans() function to calculate the mean of rows on selected columns.

rowMeans(subset(df, select = c(col1, col2)))

The following step-by-step examples will explore how to calculate the mean of rows on selected columns in R

Example 1 (data frame without missing values)

Create a sample data frame,

df <- data.frame(
  col1 = rnorm(5, mean = 5),
  col2 = rnorm(5, mean = 10),
  col2 = rnorm(4, mean = 20)
)

df

     col1      col2   col2.1
1 3.888507  9.622785 21.86776
2 3.930475  9.651241 21.42036
3 3.063293  9.869505 20.75059
4 5.144679 11.479637 19.53187
5 6.014065 10.200684 18.16099

Now, calculate the mean of rows on selected columns (col1 and col2),

df$mean <- rowMeans(subset(df, select = c(col1, col2)))

df

      col1      col2   col2.1     mean
1 3.888507  9.622785 21.86776 6.755646
2 3.930475  9.651241 21.42036 6.790858
3 3.063293  9.869505 20.75059 6.466399
4 5.144679 11.479637 19.53187 8.312158
5 6.014065 10.200684 18.16099 8.107375

In the above example, we have calculated the mean of rows on columns col1 and col2 using the rowMeans().

If the data frame contains the missing values (NA), the rowMeans() function will return the NA as a mean for that row.

Example 2 (data frame with missing values)

If there are missing values in the selected columns, you should add the na.rm = TRUE parameter to calculate the mean of rows on selected columns.

The following example demonstrates how to calculate the mean of rows on selected columns in a data frame with missing values.

Create a sample data frame with missing values,

df <- data.frame(
  col1 = c(25, NA, 30, 35, NA),
  col2 = c(170, 165, NA, 180, 175),
  col3 = c(70, NA, 65, NA, 80)
)

df

  col1 col2 col3
1   25  170   70
2   NA  165   NA
3   30   NA   65
4   35  180   NA
5   NA  175   80

Now, calculate the mean of rows on selected columns (col1 and col2),

df$mean <- rowMeans(subset(df, select = c(col1, col2)), na.rm = TRUE)

df

  col1 col2 col3  mean
1   25  170   70  97.5
2   NA  165   NA 165.0
3   30   NA   65  30.0
4   35  180   NA 107.5
5   NA  175   80 175.0

In the above example, we have calculated the mean of rows on columns col1 and col2 (which contains the NA values) using the rowMeans().