Python - Pandas delete specific rows/columns in excel

Refresh

April 2019

Views

6.5k time

1

i have the following excel file, and i would like to clean specific rows/columns so that i can further process the file.

enter image description here

I have tried this, but i have not managed to remove any of the blank lines, i ve only managed to trim from those containing data. Here, i was trying to only save the data from the third row and on.

xl = pd.ExcelFile("MRD.xlsx")
df = xl.parse("Sheet3")
df2 = df.iloc[3:]

writer4 = pd.ExcelWriter('pandas3.out.no3lines.xlsx', engine='xlsxwriter')
table5 = pd.DataFrame(df2)
table5.to_excel(writer4, sheet_name='Sheet1')
writer4.save()

I specifically want to remove rows 1, 3 (the empty ones) and the first column, so that i can pivot it. Is there a way to do this? Thank you.

1 answers

3

You can use drop(...) to remove rows, and drop(..., axis=1) to remove columns

data = [
    ['', '', '', ''],
    ['', 1, 2, 3],
    ['', '', '', ''],
    ['', 7, 8, 9],
]

import pandas as pd

df = pd.DataFrame(data)

# drop first column - [0]
df = df.drop(0, axis=1)

# drop first and third row - [0,2]
df = df.drop([0,2])

print(df)

Before:

  0  1  2  3
0           
1    1  2  3
2           
3    7  8  9

After:

   1  2  3
1  1  2  3
3  7  8  9