Set Column as Index in Pandas DataFrame - Examples
Pandas - Set Column as Index
By default an index is created for DataFrame. But, you can set a specific column of DataFrame as index, if required.
To set a column as index for a DataFrame, use DataFrame.set_index()
function, with the column name passed as argument.
You can also setup MultiIndex with multiple columns in the index. In this case, pass the array of column names required for index, to set_index() method.
Syntax of set_index()
The syntax of set_index() to setup a column as index is
myDataFrame.set_index('column_name')
where myDataFrame
is the DataFrame for which you would like to set column_name
column as index.
To setup MultiIndex, use the following syntax.
myDataFrame.set_index(['column_name_1', column_name_2])
You can pass as many column names as required.
Note that set_index()
method does not modify the original DataFrame, but returns the DataFrame with the column set as index.
Examples
1. Set column as index in Pandas DataFrame
In this example, we take a DataFrame, and try to set a column as index.
Python Program
import pandas as pd
#initialize a dataframe
df = pd.DataFrame(
[[21, 'Amol', 72, 67],
[23, 'Lini', 78, 69],
[32, 'Kiku', 74, 56],
[52, 'Ajit', 54, 76]],
columns=['rollno', 'name', 'physics', 'botony'])
print('DataFrame with default index\n', df)
#set column as index
df = df.set_index('rollno')
print('\nDataFrame with column as index\n',df)
Output
The column rollno
of the DataFrame is set as index.
Also, observe the output of original dataframe and the output of dataframe with rollno
as index. In the original dataframe, there is a separate index column (first column) with no column name. But in our second dataframe, as existing column is acting as index, this column took the first place.
2. Set multi-index for DataFrame
In this example, we will pass multiple column names as an array to set_index() method to setup MultiIndex for the Pandas DataFrame.
Python Program
import pandas as pd
#initialize a dataframe
df = pd.DataFrame(
[[21, 'Amol', 72, 67],
[23, 'Lini', 78, 69],
[32, 'Kiku', 74, 56],
[52, 'Ajit', 54, 76]],
columns=['rollno', 'name', 'physics', 'botony'])
print('DataFrame with default index\n', df)
#set multiple columns as index
df = df.set_index(['rollno','name'])
print('\nDataFrame with MultiIndex\n',df)
Output
D:\>python example1.py
DataFrame with default index
rollno name physics botony
0 21 Amol 72 67
1 23 Lini 78 69
2 32 Kiku 74 56
3 52 Ajit 54 76
DataFrame with MultiIndex
physics botony
rollno name
21 Amol 72 67
23 Lini 78 69
32 Kiku 74 56
52 Ajit 54 76
Summary
In this Pandas Tutorial, we learned how to set a specific column of the DataFrame as index.