Pandas DataFrame - Select rows by condition


Pandas DataFrame - Select rows by condition

In Pandas DataFrame, you can select rows by a condition using boolean indexing. The condition could be based on a single column or multiple columns.

In this tutorial, we shall go through examples where we shall select rows from a DataFrame, based on a condition applied on a single column, or based on a condition applied on multiple columns.

1. Select rows from DataFrame by condition on a single column

In this example, we are given a DataFrame in df, and we need to select rows from this DataFrame where the column value quantity is greater than 10.

Steps

  1. Import pandas library.
import pandas as pd
  1. Given DataFrame in df.
df = pd.DataFrame({
    'name': ['apple', 'banana', 'cherry', 'fig', 'mango'],
    'quantity': [14, 8, 6, 37, 25],
    'price': [100, 50, 20, 30, 150]
})
  1. Use boolean indexing and select rows where the column 'quantity' has a value greater than 10.
df[df['quantity'] > 10]

The expression df['quantity'] > 10 creates a boolean mask where each row is marked as True if the 'quantity' is greater than 10, or False otherwise. Then, this mask is used to index the DataFrame, selecting only the rows where the condition is True.

In total, the expression returns a new DataFrame with the selected rows that satisfy the given condition. Assign it to a variable, say df_selected_rows.

  1. You may print the resulting DataFrame df_selected_rows to output.

Program

The complete program to select rows from a DataFrame by a condition on a column is given below.

Python Program

import pandas as pd

# Take a DataFrame
df = pd.DataFrame({
    'name': ['apple', 'banana', 'cherry', 'fig', 'mango'],
    'quantity': [14, 8, 6, 37, 25],
    'price': [100, 50, 20, 30, 150]
})

# Select rows by a condition using boolean indexing
df_selected_rows = df[df['quantity'] > 10]

# Print DataFrame
print(f"Original DataFrame\n{df}\n")
print(f"Selected Rows\n{df_selected_rows}")

Output

Original DataFrame
     name  quantity  price
0   apple        14    100
1  banana         8     50
2  cherry         6     20
3     fig        37     30
4   mango        25    150

Selected Rows
    name  quantity  price
0  apple        14    100
3    fig        37     30
4  mango        25    150

2. Select rows from DataFrame by condition on multiple columns

In this example, we are given a DataFrame in df, and we need to select rows from this DataFrame where the column value quantity is greater than 10 and the column value price is greater than or equal to 100.

Steps

  1. Import pandas library.
import pandas as pd
  1. Given DataFrame in df.
df = pd.DataFrame({
    'name': ['apple', 'banana', 'cherry', 'fig', 'mango'],
    'quantity': [14, 8, 6, 37, 25],
    'price': [100, 50, 20, 30, 150]
})
  1. Use boolean indexing and select rows where the column 'quantity' has a value greater than 10 and the column 'price' has a value greater than or equal to 100.
df[(df['quantity'] > 10) & (df['price'] >= 100)]

This expression returns a new DataFrame with the rows that satisfy the given condition. Assign it to a variable, say df_selected_rows.

Please observe the parenthesis around the individual conditions. And these conditions are joined by & (for "and") logical operator, because we needed the rows that satisfy both the conditions on the specified columns.

  1. You may print the resulting DataFrame df_selected_rows to output.

Program

The complete program to select rows from a DataFrame by conditions on two columns is given below.

Python Program

import pandas as pd

# Take a DataFrame
df = pd.DataFrame({
    'name': ['apple', 'banana', 'cherry', 'fig', 'mango'],
    'quantity': [14, 8, 6, 37, 25],
    'price': [100, 50, 20, 30, 150]
})

# Select rows by a condition using boolean indexing
df_selected_rows = df[(df['quantity'] > 10) & (df['price'] >= 100)]

# Print DataFrame
print(f"Original DataFrame\n{df}\n")
print(f"Selected Rows\n{df_selected_rows}")

Output

Original DataFrame
     name  quantity  price
0   apple        14    100
1  banana         8     50
2  cherry         6     20
3     fig        37     30
4   mango        25    150

Selected Rows
    name  quantity  price
0  apple        14    100
4  mango        25    150

Summary

In this Pandas Tutorial, we learned how to select rows from a DataFrame using boolean indexing. We covered examples where we selected rows from a DataFrame based on a condition applied on a single column, or based on a condition applied on multiple columns, with example programs.