Contents

How to Join Multiple DataFrames in pandas

By default, you can join two pandas DataFrame based on common column name (key column) using the merge function.

If you want to join multiple DataFrames (three or more) based on key column, you can use either the merge or join function.

Method 1: merge function

For example, if you have three DataFrames df1, df2, and df3 with a col1 key column among these three DataFrames. You can join these three DataFrames using the merge function as follows:

# import package
import pandas as pd

# merge three DataFrames
merged_df = df1.merge(df2, on='col1', how='outer').merge(df3, on='col1', how='outer')

In the above code, you can specify the key column on which to join and the type of join (inner, outer, left, right, etc.). The key column has the common name among multiple DataFrames.

Method 2: join function

For example, if you have three DataFrames df1, df2, and df3 with a common index or key column. You can join these three DataFrames using the join function as follows:

# import package
import pandas as pd

# merge three DataFrames
merged_df = df1.join(df2.set_index('col1'), how='outer').join(df3.set_index('col1'), how='outer')

In this method, you need to set a key column as the index for all DataFrames.

The following examples explain how to use the pandas merge and join functions to join multiple DataFrame.

Example 1: merge function for joining multiple DataFrames

Create three sample DataFrames,

# load packages
import pandas as pd
import numpy as np

# create random pandas dataframe
df1 = pd.DataFrame({'col1': ['A', 'B', 'C'], 'col2': [10, 20, 30]})
df2 = pd.DataFrame({'col1': ['A', 'B', 'D'], 'col3': [49, 45, 60]})
df3 = pd.DataFrame({'col1': ['B', 'C', 'E'], 'col4': [70, 81, 91]})

In the above example, the column col1 is a key column among the three DataFrames. You can use that column to join the three DataFrames using the merge function.

Perform outer join,

# 
merged_df = df1.merge(df2, on='col1', how='outer').merge(df3, on='col1', how='outer')

# output

  col1  col2  col3  col4
0    A  10.0  49.0   NaN
1    B  20.0  45.0  70.0
2    C  30.0   NaN  81.0
3    D   NaN  60.0   NaN
4    E   NaN   NaN  91.0

Perform left join,

# 
merged_df = df1.merge(df2, on='col1', how='left').merge(df3, on='col1', how='left')

# output

  col1  col2  col3  col4
0    A    10  49.0   NaN
1    B    20  45.0  70.0
2    C    30   NaN  81.0

Similarly, you can also join four or more DataFrames using the merge function. For example, you can use the following code for the four DataFrames (df1, df2, df3, and df4).

# left join
merged_df = df1.merge(df2, on='col1', how='left').merge(df3, on='col1', how='left').merge(df4, on='col1', how='left')

Similarly, you can also use multiple common columns as a key to join the multiple DataFrames.

Example 2: join function for joining multiple DataFrames

You can use the join function from pandas to join the multiple DataFrames based on the index or key column.

Create four sample DataFrames,

# load packages
import pandas as pd
import numpy as np

# create random pandas dataframe
df1 = pd.DataFrame({'col1': ['A', 'B', 'C'], 'col2': [10, 20, 30]})
df2 = pd.DataFrame({'col1': ['A', 'B', 'D'], 'col3': [49, 45, 60]})
df3 = pd.DataFrame({'col1': ['B', 'C', 'E'], 'col4': [70, 81, 91]})
df4 = pd.DataFrame({'col1': ['Z', 'C', 'B'], 'col5': [88, 89, 91]})

Perform the left join on these four DataFrames,

merged_df = df1.join(df2.set_index('col1'), on='col1', how='left').join(df3.set_index('col1'), on='col1', how='left').merge(df4.set_index('col1'), on='col1', how='left')

merged_df

# output
  col1  col2  col3  col4  col5
0    A    10  49.0   NaN   NaN
1    B    20  45.0  70.0  91.0
2    C    30   NaN  81.0  89.0

In the above example, we first set the key column as an index for all DataFrames and used the join function to perform the left join.

In addition to left join, you can also perform other join operations such as right, outer, inner, etc.