When to Use and Avoid `apply` in pandas DataFrame
pandas apply
function is widely used to apply custom functions on rows and columns of the DataFrame.
The apply
function is like a loop function which iterates through each element of rows or columns based on
a given axis, and applies the given function.
The basic syntax of the apply
function is:
pd.DataFrame.apply(function, axis=0)
The axis parameter defines the axis where the function is applied. The axis=0 means the function is applied on each column.
Even though the pandas apply
function is powerful for DataFrane manipulation, it should be used cautiously as it is slow and
requires more memory. Please read this article which compares the performance of apply
function and vectorization.
The apply
function should mostly used when the vectorized operations are not possible on the DataFrane.
For example, the apply
function can be used for performing complex operations on DataFrame such as calculations that involve conditional logic,
and it should be completely avoided for performing vectorized operations such as calculating mean, sum, etc.
In addition, the apply
function should be avoided for large datasets.
The following examples demonstrate when to use and avoid apply
function for pandas DataFrame manipulations.
Using apply
function
You should use the apply
function when you want to perform conditional complex manipulations on rows or columns of the
DataFrame.
The following example explains when to use the pandas apply
function.
Create a pandas DataFrame,
# import package
import pandas as pd
df = pd.DataFrame({'col1': [25, 45, 95], 'col2': [70, 88, 55]})
# view DataFrame
df
col1 col2
0 25 70
1 45 88
2 95 55
Now, use apply
function for conditional data manipulation.
df['col3']=df['col2'].apply(lambda x: 0 if x > 80 else x * x)
df
col1 col2 col3
0 25 70 4900
1 45 88 0
2 95 55 3025
In the above example, the apply
function is useful as we are doing complex manipulation and the dataset is small. In this case,
vectorization may not be efficient for custom functions.
Avoid apply
function
You should avoid apply
function when performing simple arithmetic calculations such as column-wise mean, multiplication, and sum. In addition,
you should not use the apply
function while working on large datasets.
The following example explains when to avoid pandas apply
function.
Create a pandas DataFrame,
# import package
import pandas as pd
df = pd.DataFrame({'col1': [25, 45, 95], 'col2': [70, 88, 55]})
# view DataFrame
df
col1 col2
0 25 70
1 45 88
2 95 55
Now, perform multiplication using vectorization.
df['col3'] = df['col2'] * 5
col1 col2 col3
0 25 70 350
1 45 88 440
2 95 55 275