How to Set Multicolumn Index in Pandas

In pandas DataFrame, you can set multiple columns as an index using the set_index() function.

The basic synatx for set_index() function is:

df = df.set_index(['col1', 'col2'])

The following examples demonstrate how to use the set_index() function to create an index from multiple columns in pandas DataFrame.

Create a sample pandas DataFrame,

# import package
import pandas as pd

df = pd.DataFrame({'col1': [1, 2, 3, 4, 6], 
	'col2': ['A', 'B', 'A', 'C', 'D'], 
	'col3': [4, 5, 6, 10, 12]})

# view DataFrame
df

   col1 col2  col3
0     1    A     4
1     2    B     5
2     3    A     6
3     4    C    10
4     6    D    12

By default, the pandas DataFrame automatically assigns the default integer index starting from 0 and incrementing by 1 for each row.

You can set the index based on multiple columns (e.g. col1 and col2) using the set_index() function.

df = df.set_index(['col1', 'col2'])

df

          col3
col1 col2
1    A        4
2    B        5
3    A        6
4    C       10
6    D       12

You can see that new multicolumn index is applied to df.

You can also add inplace=True parameter to modify existing DataFrame without creating a new DataFrame.

df.set_index(['col1', 'col2'], inplace=True)

df

          col3
col1 col2
1    A        4
2    B        5
3    A        6
4    C       10
6    D       12

If you want to import pandas DataFrame from a CSV file, you can also assign multiple columns index using the index_col parameter.

# import package
import pandas as pd

df = pd.read_csv('file.csv', index_col=['col1', 'col2'])