Contents

When to Use and Avoid `apply` in pandas DataFrame

pandas apply function is widely used to apply custom functions on rows and columns of the DataFrame.

The apply function is like a loop function which iterates through each element of rows or columns based on a given axis, and applies the given function.

The basic syntax of the apply function is:

pd.DataFrame.apply(function, axis=0)

The axis parameter defines the axis where the function is applied. The axis=0 means the function is applied on each column.

Even though the pandas apply function is powerful for DataFrane manipulation, it should be used cautiously as it is slow and requires more memory. Please read this article which compares the performance of apply function and vectorization.

The apply function should mostly used when the vectorized operations are not possible on the DataFrane.

For example, the apply function can be used for performing complex operations on DataFrame such as calculations that involve conditional logic, and it should be completely avoided for performing vectorized operations such as calculating mean, sum, etc.

In addition, the apply function should be avoided for large datasets.

The following examples demonstrate when to use and avoid apply function for pandas DataFrame manipulations.

Using apply function

You should use the apply function when you want to perform conditional complex manipulations on rows or columns of the DataFrame.

The following example explains when to use the pandas apply function.

Create a pandas DataFrame,

# import package
import pandas as pd

df =  pd.DataFrame({'col1': [25, 45, 95], 'col2': [70, 88, 55]})

# view DataFrame
df

  col1  col2
0    25    70
1    45    88
2    95    55

Now, use apply function for conditional data manipulation.

df['col3']=df['col2'].apply(lambda x: 0 if x > 80 else  x * x)

df

   col1  col2  col3
0    25    70  4900
1    45    88     0
2    95    55  3025

In the above example, the apply function is useful as we are doing complex manipulation and the dataset is small. In this case, vectorization may not be efficient for custom functions.

Avoid apply function

You should avoid apply function when performing simple arithmetic calculations such as column-wise mean, multiplication, and sum. In addition, you should not use the apply function while working on large datasets.

The following example explains when to avoid pandas apply function.

Create a pandas DataFrame,

# import package
import pandas as pd

df =  pd.DataFrame({'col1': [25, 45, 95], 'col2': [70, 88, 55]})

# view DataFrame
df

  col1  col2
0    25    70
1    45    88
2    95    55

Now, perform multiplication using vectorization.

df['col3'] = df['col2'] * 5

   col1  col2  col3
0    25    70   350
1    45    88   440
2    95    55   275