Pandas DataFrame - Select Multiple Columns


Pandas DataFrame - Select Multiple Columns

There are many ways to select multiple columns from a DataFrame in Pandas. They are

  • Using square bracket notation with DataFrame.
  • Using DataFrame.loc property.
  • Using DataFrame.iloc property.

We shall go through each of these approaches with examples.


Video Tutorial

https://youtu.be/3WbgFXfpZas?si=Jc47de5mLkbK4Kxo

Examples

1. Select multiple columns of DataFrame using Square brackets []

To select multiple columns of DataFrame in Pandas using square brackets, specify the required column names as a list inside the square brackets after the DataFrame object.

DataFrame[list_of_column_names]

An example would be as shown below.

df[['A', 'B']]

In this example, we shall take a DataFrame in df_1 with three columns: A, B, and C. We have to select the columns A and B from the given DataFrame df_1.

  1. Given a DataFrame in df_1 with three columns: A, B, and C.
  2. Use square brackets with the DataFrame df_1, and pass the list of required column names into the square brackets.
df_1[['A', 'B']]
  1. The expression in the above step returns a DataFrame with the selected columns. Store the returned DataFrame in selected_columns.
  2. You may print the DataFrame selected_columns to output.

Python Program

import pandas as pd

# Take a DataFrame
df_1 = pd.DataFrame({
    'A': [1, 1, 3, 5, 1],
    'B': [2, 4, 5, 1, 3],
    'C': [10, 20, 10, 30, 20]
})

# Select columns from DataFrame
selected_columns = df_1[['A', 'B']]
print(selected_columns)

Output

   A  B
0  1  2
1  1  4
2  3  5
3  5  1
4  1  3

2. Select multiple columns of DataFrame using DataFrame.loc property

To select multiple columns of DataFrame in Pandas, you can use DataFrame.loc property. loc property of a DataFrame lets us access the columns of a DataFrame using column labels.

DataFrame.loc[:, list_of_column_names]

An example would be as shown below.

df.loc[:, ['A', 'B']]

where

  • : selects all the rows, and
  • ['A', 'B'] selects columns 'A' and 'B'.

The key difference between the first and second approach is that the square brackets [] allow for more flexible selection because they permit row and column selection in a single step. On the other hand, .loc[] forces you to specify both row and column selections explicitly, which can be beneficial for code clarity and preventing unintended behavior.

In this example, we shall take a DataFrame in df_1 with three columns: A, B, and C, and five rows. We have to select the columns A and B from the given DataFrame df_1 using loc property of the DataFrame.

  1. Given a DataFrame in df_1 with three columns: A, B, and C.
  2. Use loc property of the DataFrame df_1. Select all the rows, and select the specified columns A and C.
df_1.loc[:, ['A', 'B']]
  1. The expression in the above step returns a DataFrame with all the rows and selected columns. Store the returned DataFrame in selected_columns.
  2. You may print the DataFrame selected_columns to output.

Python Program

import pandas as pd

# Take a DataFrame
df_1 = pd.DataFrame({
    'A': [1, 1, 3, 5, 1],
    'B': [2, 4, 5, 1, 3],
    'C': [10, 20, 10, 30, 20]
})

# Select columns from DataFrame
selected_columns = df_1.loc[:, ['A', 'B']]
print(selected_columns)

Output

   A  B
0  1  2
1  1  4
2  3  5
3  5  1
4  1  3

3. Select multiple columns of DataFrame using DataFrame.iloc property

To select multiple columns of DataFrame in Pandas, you can use DataFrame.iloc property. iloc property of a DataFrame lets us access the columns of a DataFrame using column index.

DataFrame.iloc[:, list_of_column_indices]

An example would be as shown below.

df.iloc[:, [0, 1]]

where

  • : selects all the rows, and
  • [0, 1] selects columns with index=0 and index=1.

The key difference between loc an iloc is that the loc lets you access the columns using column labels, whereas iloc lets you access the columns using their index.

In this example, we shall take a DataFrame in df_1 with three columns: A, B, and C, and five rows. We have to select the columns A and B from the given DataFrame df_1 using iloc property of the DataFrame.

  1. Given a DataFrame in df_1 with three columns: A, B, and C.
  2. Use iloc property of the DataFrame df_1. Select all the rows, and select the specified columns with index values 0 and 1.
df_1.iloc[:, [0, 1]]
  1. The expression in the above step returns a DataFrame with all the rows and selected columns. Store the returned DataFrame in selected_columns.
  2. You may print the DataFrame selected_columns to output.

Python Program

import pandas as pd

# Take a DataFrame
df_1 = pd.DataFrame({
    'A': [1, 1, 3, 5, 1],
    'B': [2, 4, 5, 1, 3],
    'C': [10, 20, 10, 30, 20]
})

# Select columns from DataFrame
selected_columns = df_1.iloc[:, [0, 1]]
print(selected_columns)

Output

   A  B
0  1  2
1  1  4
2  3  5
3  5  1
4  1  3

Summary

In this Pandas Tutorial, we learned how to select multiple columns from a DataFrame using square brackets, loc property of DataFrame, or iloc property of DataFrame, with examples, and also learnt about the key differences between the three approaches.


Python Libraries