specify dtypes when saving pandas dataframe to a binary file


February 2019


7 time


I have a pandas DataFrame I want to write to a binary file, however the df contains mixed dtypes and ints. If I used df.values.tofile() I cannot specify different dtypes (even when specifying astype('f4, f4, i4, i4').tofile() in below example). Workaround at the moment is to use struct but is very slow!

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.random(size=(10, 4)) * 10, columns=['f1', 'f2', 'i1', 'i2'])
df.i1 = df.i1.astype(int)
df.i2 = df.i2.astype(int)

with open('tmp', 'w') as ply:    

    for ix, row in df.iterrows():

        ply.write(struct.pack('<ffii', *row.values))

I am creating a .ply file which requires the data to be formatted correctly.

0 answers