specify dtypes when saving pandas dataframe to a binary file

Refresh

February 2019

Views

7 time

0

I have a pandas DataFrame I want to write to a binary file, however the df contains mixed dtypes and ints. If I used df.values.tofile() I cannot specify different dtypes (even when specifying astype('f4, f4, i4, i4').tofile() in below example). Workaround at the moment is to use struct but is very slow!

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.random(size=(10, 4)) * 10, columns=['f1', 'f2', 'i1', 'i2'])
df.i1 = df.i1.astype(int)
df.i2 = df.i2.astype(int)

with open('tmp', 'w') as ply:    

    for ix, row in df.iterrows():

        ply.write(struct.pack('<ffii', *row.values))

I am creating a .ply file which requires the data to be formatted correctly.

0 answers