Pandas DataFrame.query


Pandas DataFrame.query

The DataFrame.query method in pandas allows querying and filtering rows of a DataFrame using a string expression. It provides an intuitive way to subset data without explicitly using indexing or boolean masking.


Syntax

The syntax for DataFrame.query is:

DataFrame.query(expr, *, inplace=False, **kwargs)

Here, DataFrame refers to the pandas DataFrame being queried.


Parameters

ParameterDescription
exprA string expression to evaluate. Column names can be referenced directly in the expression.
inplaceIf True, modifies the original DataFrame. If False, returns a new DataFrame. Defaults to False.
**kwargsAdditional keyword arguments passed to pandas.eval, such as engine and parser.

Returns

A DataFrame filtered by the given query expression.


Examples

Querying Rows Based on a Single Condition

Filter rows where the Age column is greater than 30.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Query rows where Age > 30
print("Rows where Age > 30:")
filtered_df = df.query('Age > 30')
print(filtered_df)

Output

Rows where Age > 30:
    Name  Age  Salary
2  Priya   35   90000

Querying Rows Based on Multiple Conditions

Filter rows where Age is greater than 25 and Salary is less than 90000.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Query rows with multiple conditions
print("Rows where Age > 25 and Salary < 90000:")
filtered_df = df.query('Age > 25 and Salary < 90000')
print(filtered_df)

Output

Rows where Age > 25 and Salary < 90000:
   Name  Age  Salary
1   Ram   30   80000

Using Variables in Query Expressions

Include external variables in a query using the @ symbol.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Define a variable
min_age = 30

# Query rows using a variable
print("Rows where Age >= min_age (min_age=30):")
filtered_df = df.query('Age >= @min_age')
print(filtered_df)

Output

Rows where Age >= min_age (min_age=30):
    Name  Age  Salary
1    Ram   30   80000
2   Priya   35   90000

Modifying the Original DataFrame

Use inplace=True to filter the original DataFrame directly.

Python Program

import pandas as pd

# Create a DataFrame
data = {
    'Name': ['Arjun', 'Ram', 'Priya'],
    'Age': [25, 30, 35],
    'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Modify the original DataFrame
print("Filtering original DataFrame where Salary > 75000:")
df.query('Salary > 75000', inplace=True)
print(df)

Output

Filtering original DataFrame where Salary > 75000:
    Name  Age  Salary
1    Ram   30   80000
2   Priya   35   90000

Summary

In this tutorial, we explored the DataFrame.query method in pandas. Key takeaways include:

  • Using query for intuitive filtering with string expressions.
  • Applying single or multiple conditions.
  • Incorporating variables with the @ symbol.
  • Using inplace=True to modify the original DataFrame.

The DataFrame.query method is a powerful and flexible tool for filtering rows in pandas DataFrames.


Python Libraries