Python Pickle - Pandas DataFrame


Python Pickle - Pandas DataFrame

To pickle a DataFrame in Python, use pickle.dump(), and to unpickle it, use pickle.load().

In this tutorial, we shall learn how to pickle and unpickle a Pandas DataFrame, with the help of example programs.


1. Pickle a DataFrame

In the following example, we will initialize a DataFrame and then pickle it to a file. The steps for pickling are:

  • Create a file in write mode and handle the file as binary.
  • Call the function pickle.dump(file, dataframe).

Python Program

import numpy as np
import pandas as pd
import pickle

# Initialize the DataFrame
df = pd.DataFrame(
	[['Somu', 68, 84, 78, 96],
	['Kiku', 74, 56, 88, 85],
	['Amol', 77, 73, 82, 87],
	['Lini', 78, 69, 87, 92]],
	columns=['name', 'physics', 'chemistry','algebra','calculus'])

#create a file
picklefile = open('df_marks', 'wb')
# Pickle the DataFrame
pickle.dump(df, picklefile)
# Close file
picklefile.close()

Explanation:

  1. The DataFrame df is created using Pandas with the provided data.
  2. The file df_marks is opened in write-binary mode using open('df_marks', 'wb').
  3. The pickle.dump() function serializes the DataFrame and writes it to the file.
  4. The file is then closed to ensure the serialized data is saved correctly.

The pickle file df_marks is now created in the current working directory.


2. Un-pickle a DataFrame

In the following example, we will read the pickle file and then unpickle it to retrieve the original DataFrame.

The steps for unpickling are:

  1. Read the file in read mode and handle the file as binary.
  2. Call the function pickle.load(file) to deserialize the DataFrame.

Python Program

import numpy as np
import pandas as pd
import pickle

# Read the pickle file
picklefile = open('df_marks', 'rb')
# Unpickle the DataFrame
 df = pickle.load(picklefile)
# Close the file
picklefile.close()

# Print the DataFrame
print(type(df))
print(df)

Explanation:

  1. The file df_marks is opened in read-binary mode using open('df_marks', 'rb').
  2. The pickle.load() function deserializes the data and reconstructs the original DataFrame.
  3. We print the type of the object to verify it is a pandas.DataFrame, and then print the entire DataFrame.

Output

<class 'pandas.core.frame.DataFrame'>
   name  physics  chemistry  algebra  calculus
0  Somu       68         84       78        96
1  Kiku       74         56       88        85
2  Amol       77         73       82        87
3  Lini       78         69       87        92

3. Pickling and Unpickling with Different File Formats

In some cases, you might want to pickle and unpickle DataFrames with different file formats, such as using a specific file extension for clarity. Here's an example where we use .pkl extension for the pickle file:

Python Program

# Save DataFrame with .pkl extension
picklefile = open('df_marks.pkl', 'wb')
pickle.dump(df, picklefile)
picklefile.close()

# Load DataFrame from .pkl file
picklefile = open('df_marks.pkl', 'rb')
df = pickle.load(picklefile)
picklefile.close()

# Print the loaded DataFrame
print(df)

Explanation:

  1. The DataFrame is pickled into a file named df_marks.pkl using the .pkl extension for clarity.
  2. The DataFrame is unpickled from the file df_marks.pkl and then printed to confirm the successful operation.

Summary

In this tutorial, we covered how to serialize and deserialize Pandas DataFrames using the Pickle library. We demonstrated how to pickle and unpickle DataFrames using different file formats and explained the process with code examples.


Python Libraries