How to Change Datatype of Columns in Pandas DataFrame?


Change Datatype of DataFrame Columns in Pandas

To change the datatype of DataFrame columns, use DataFrame.astype() method, DataFrame.infer_objects() method, or pd.to_numeric.

In this tutorial, we will go through some of these processes in detail using examples.

Change datatype if column(s) using DataFrame.astype()

DataFrame.astype() casts this DataFrame to a specified datatype. Following is the syntax of astype() method.

astype(dtype, copy=True, errors='raise', **kwargs)

we are interested only in the first argument dtype. dtype is data type, or dict of column name -> data type.

So, let us use astype() method with dtype argument to change datatype of one or more columns of DataFrame.

1. Change datatype of one colum

Let us first start with changing datatype of just one column.

In the following program, we shall change the datatype of column a to float.

Python Program

import pandas as pd
import numpy as np

#initialize a dataframe
df = pd.DataFrame(
	[[21, 72, 67],
	[23, 78, 62],
	[32, 74, 54],
	[52, 54, 76]],
	columns=['a', 'b', 'c'])

print('Previous Datatypes\n', df.dtypes, sep='') 

#change datatype of column
df = df.astype({'a': np.float})

#print results
print('\nNew Datatypes\n', df.dtypes, sep='') 
print('\nDataFrame\n', df, sep='')

Output

Previous Datatypes
a    int64
b    int64
c    int64
dtype: object

New Datatypes
a    float64
b      int64
c      int64
dtype: object

DataFrame
      a   b   c
0  21.0  72  67
1  23.0  78  62
2  32.0  74  54
3  52.0  54  76

2. Change datatype of multiple columns

Now, let us change datatype of more than one column. All, we have to do is provide more column_name:datatype key:value pairs in the argument to astype() method.

In the following program, we shall change the datatype of column a to float, and b to int8.

Python Program

import pandas as pd
import numpy as np

#initialize a dataframe
df = pd.DataFrame(
	[[21, 72, 67],
	[23, 78, 62],
	[32, 74, 54],
	[52, 54, 76]],
	columns=['a', 'b', 'c'])

print('Previous Datatypes\n', df.dtypes, sep='') 

#change datatype of column
df = df.astype({'a': np.float, 'b': np.int8})

#print results
print('\nNew Datatypes\n', df.dtypes, sep='') 
print('\nDataFrame\n', df, sep='')

Output

Previous Datatypes
a    int64
b    int64
c    int64
dtype: object

New Datatypes
a    float64
b       int8
c      int64
dtype: object

DataFrame
      a   b   c
0  21.0  72  67
1  23.0  78  62
2  32.0  74  54
3  52.0  54  76

3. Change datatype of all columns

If you would like to change the datatype of all columns of DataFrame, you can just pass this datatype as argument to astype() method, without the need of dictionary.

In the following program, we shall change the datatype of all column to float.

Python Program

import pandas as pd
import numpy as np

#initialize a dataframe
df = pd.DataFrame(
	[[21, 72, 67],
	[23, 78, 62],
	[32, 74, 54],
	[52, 54, 76]],
	columns=['a', 'b', 'c'])

print('Previous Datatypes\n', df.dtypes, sep='') 

#change datatype of column
df = df.astype(np.float)

#print results
print('\nNew Datatypes\n', df.dtypes, sep='') 
print('\nDataFrame\n', df, sep='')

Output

Previous Datatypes
a    int64
b    int64
c    int64
dtype: object

New Datatypes
a    float64
b    float64
c    float64
dtype: object

DataFrame
      a     b     c
0  21.0  72.0  67.0
1  23.0  78.0  62.0
2  32.0  74.0  54.0
3  52.0  54.0  76.0

Change datatype of columns using pandas.to_numeric

Consider that you have imported a DataFrame from Excel, CSV, or some other source, and you got all string values for DataFrame Elements. The datatype of these columns could be object. And you would like to convert the datatype of all these columns to fitting numeric datatypes.

Use the following syntax to convert datatype of DataFrame columns to numeric.

df = df.apply(pd.to_numeric)

Python Program

import pandas as pd
import numpy as np

#initialize a dataframe
df = pd.DataFrame(
	[['21', '72', '67'],
	['23', '78', '62'],
	['32', '74', '54'],
	['52', '54', '76']],
	columns=['a', 'b', 'c'])

print('Previous Datatypes\n', df.dtypes, sep='') 

#change datatype of all columns
df = df.apply(pd.to_numeric)

#print results
print('\nNew Datatypes\n', df.dtypes, sep='') 
print('\nDataFrame\n', df, sep='')

Output

Previous Datatypes
a    object
b    object
c    object
dtype: object

New Datatypes
a    int64
b    int64
c    int64
dtype: object

DataFrame
    a   b   c
0  21  72  67
1  23  78  62
2  32  74  54
3  52  54  76

Summary

Summarizing this Python Tutorial, we learned how to change the datatype of columns in DataFrame.