How to Join Multiple DataFrames in pandas
By default, you can join two pandas DataFrame based on common column name (key column) using the merge
function.
If you want to join multiple DataFrames (three or more) based on key column, you can use either the merge
or join
function.
Method 1: merge
function
For example, if you have three DataFrames df1, df2, and df3 with a col1
key column among these three DataFrames. You can join
these three DataFrames using the merge
function as follows:
# import package
import pandas as pd
# merge three DataFrames
merged_df = df1.merge(df2, on='col1', how='outer').merge(df3, on='col1', how='outer')
In the above code, you can specify the key column on which to join and the type of join (inner, outer, left, right, etc.). The key column has the common name among multiple DataFrames.
Method 2: join
function
For example, if you have three DataFrames df1, df2, and df3 with a common index or key column. You can join these three DataFrames using the join
function as follows:
# import package
import pandas as pd
# merge three DataFrames
merged_df = df1.join(df2.set_index('col1'), how='outer').join(df3.set_index('col1'), how='outer')
In this method, you need to set a key column as the index for all DataFrames.
The following examples explain how to use the pandas merge
and join
functions to join multiple DataFrame.
Example 1: merge
function for joining multiple DataFrames
Create three sample DataFrames,
# load packages
import pandas as pd
import numpy as np
# create random pandas dataframe
df1 = pd.DataFrame({'col1': ['A', 'B', 'C'], 'col2': [10, 20, 30]})
df2 = pd.DataFrame({'col1': ['A', 'B', 'D'], 'col3': [49, 45, 60]})
df3 = pd.DataFrame({'col1': ['B', 'C', 'E'], 'col4': [70, 81, 91]})
In the above example, the column col1
is a key column among the three DataFrames. You can use that column to join the
three DataFrames using the merge
function.
Perform outer join,
#
merged_df = df1.merge(df2, on='col1', how='outer').merge(df3, on='col1', how='outer')
# output
col1 col2 col3 col4
0 A 10.0 49.0 NaN
1 B 20.0 45.0 70.0
2 C 30.0 NaN 81.0
3 D NaN 60.0 NaN
4 E NaN NaN 91.0
Perform left join,
#
merged_df = df1.merge(df2, on='col1', how='left').merge(df3, on='col1', how='left')
# output
col1 col2 col3 col4
0 A 10 49.0 NaN
1 B 20 45.0 70.0
2 C 30 NaN 81.0
Similarly, you can also join four or more DataFrames using the merge
function. For example, you can use
the following code for the four DataFrames (df1
, df2
, df3
, and df4
).
# left join
merged_df = df1.merge(df2, on='col1', how='left').merge(df3, on='col1', how='left').merge(df4, on='col1', how='left')
Similarly, you can also use multiple common columns as a key to join the multiple DataFrames.
Example 2: join
function for joining multiple DataFrames
You can use the join
function from pandas to join the multiple DataFrames based on the index or key column.
Create four sample DataFrames,
# load packages
import pandas as pd
import numpy as np
# create random pandas dataframe
df1 = pd.DataFrame({'col1': ['A', 'B', 'C'], 'col2': [10, 20, 30]})
df2 = pd.DataFrame({'col1': ['A', 'B', 'D'], 'col3': [49, 45, 60]})
df3 = pd.DataFrame({'col1': ['B', 'C', 'E'], 'col4': [70, 81, 91]})
df4 = pd.DataFrame({'col1': ['Z', 'C', 'B'], 'col5': [88, 89, 91]})
Perform the left join on these four DataFrames,
merged_df = df1.join(df2.set_index('col1'), on='col1', how='left').join(df3.set_index('col1'), on='col1', how='left').merge(df4.set_index('col1'), on='col1', how='left')
merged_df
# output
col1 col2 col3 col4 col5
0 A 10 49.0 NaN NaN
1 B 20 45.0 70.0 91.0
2 C 30 NaN 81.0 89.0
In the above example, we first set the key column as an index for all DataFrames and used the join
function to perform the left join.
In addition to left join, you can also perform other join operations such as right, outer, inner, etc.