How to Set Multicolumn Index in Pandas
In pandas DataFrame, you can set multiple columns as an index using the set_index()
function.
The basic synatx for set_index()
function is:
df = df.set_index(['col1', 'col2'])
The following examples demonstrate how to use the set_index()
function to create an index from multiple
columns in pandas DataFrame.
Create a sample pandas DataFrame,
# import package
import pandas as pd
df = pd.DataFrame({'col1': [1, 2, 3, 4, 6],
'col2': ['A', 'B', 'A', 'C', 'D'],
'col3': [4, 5, 6, 10, 12]})
# view DataFrame
df
col1 col2 col3
0 1 A 4
1 2 B 5
2 3 A 6
3 4 C 10
4 6 D 12
By default, the pandas DataFrame automatically assigns the default integer index starting from 0 and incrementing by 1 for each row.
You can set the index based on multiple columns (e.g. col1
and col2
) using the set_index()
function.
df = df.set_index(['col1', 'col2'])
df
col3
col1 col2
1 A 4
2 B 5
3 A 6
4 C 10
6 D 12
You can see that new multicolumn index is applied to df
.
You can also add inplace=True
parameter to modify existing DataFrame without creating a new DataFrame.
df.set_index(['col1', 'col2'], inplace=True)
df
col3
col1 col2
1 A 4
2 B 5
3 A 6
4 C 10
6 D 12
If you want to import pandas DataFrame from a CSV file, you can also assign multiple columns index using the index_col
parameter.
# import package
import pandas as pd
df = pd.read_csv('file.csv', index_col=['col1', 'col2'])