Pandas DataFrame.query
Pandas DataFrame.query
The DataFrame.query
method in pandas allows querying and filtering rows of a DataFrame using a string expression. It provides an intuitive way to subset data without explicitly using indexing or boolean masking.
Syntax
The syntax for DataFrame.query
is:
DataFrame.query(expr, *, inplace=False, **kwargs)
Here, DataFrame
refers to the pandas DataFrame being queried.
Parameters
Parameter | Description |
---|---|
expr | A string expression to evaluate. Column names can be referenced directly in the expression. |
inplace | If True , modifies the original DataFrame. If False , returns a new DataFrame. Defaults to False . |
**kwargs | Additional keyword arguments passed to pandas.eval , such as engine and parser . |
Returns
A DataFrame filtered by the given query expression.
Examples
Querying Rows Based on a Single Condition
Filter rows where the Age
column is greater than 30.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Query rows where Age > 30
print("Rows where Age > 30:")
filtered_df = df.query('Age > 30')
print(filtered_df)
Output
Rows where Age > 30:
Name Age Salary
2 Priya 35 90000
Querying Rows Based on Multiple Conditions
Filter rows where Age
is greater than 25 and Salary
is less than 90000.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Query rows with multiple conditions
print("Rows where Age > 25 and Salary < 90000:")
filtered_df = df.query('Age > 25 and Salary < 90000')
print(filtered_df)
Output
Rows where Age > 25 and Salary < 90000:
Name Age Salary
1 Ram 30 80000
Using Variables in Query Expressions
Include external variables in a query using the @
symbol.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Define a variable
min_age = 30
# Query rows using a variable
print("Rows where Age >= min_age (min_age=30):")
filtered_df = df.query('Age >= @min_age')
print(filtered_df)
Output
Rows where Age >= min_age (min_age=30):
Name Age Salary
1 Ram 30 80000
2 Priya 35 90000
Modifying the Original DataFrame
Use inplace=True
to filter the original DataFrame directly.
Python Program
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Arjun', 'Ram', 'Priya'],
'Age': [25, 30, 35],
'Salary': [70000, 80000, 90000]
}
df = pd.DataFrame(data)
# Modify the original DataFrame
print("Filtering original DataFrame where Salary > 75000:")
df.query('Salary > 75000', inplace=True)
print(df)
Output
Filtering original DataFrame where Salary > 75000:
Name Age Salary
1 Ram 30 80000
2 Priya 35 90000
Summary
In this tutorial, we explored the DataFrame.query
method in pandas. Key takeaways include:
- Using
query
for intuitive filtering with string expressions. - Applying single or multiple conditions.
- Incorporating variables with the
@
symbol. - Using
inplace=True
to modify the original DataFrame.
The DataFrame.query
method is a powerful and flexible tool for filtering rows in pandas DataFrames.