Python recipe missing first row (header)

tmgiang · ‎11-06-2017

Hi,

I try some Python code in recipe to read csv / excel file from html and local csv. The code link below:

read from html

link_down = link_share + '/download'

df_proj_list = pandas.read_excel(link_down, skiprows=1) #1 empty row

# Recipe outputs

digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")

digi_PAY_Master_Data_original.write_with_schema(df_proj_list)

But when explore the result, the header row was missing, and it uses the first data row as header instead.

I try this codes in my jupyter notebook and it work well.

So I try again, by using local jupyter notebook to read the file from url, then export to a csv file, then use Dataiku recipe to read the exported file

The code in Dataiku recipe:

read from csv

olap_folder = '/home/giangtm/Work/Projects/DataScience/olap/'

file_master_data = 'DIGI-PAY_Master_Data.csv'

df_proj_list = pandas.read_csv(olap_folder + file_master_data)

# Recipe outputs

digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")

digi_PAY_Master_Data_original.write_with_schema(df_proj_list)

But the output data was missing the header again 🙂 (means that the header was lost, and it use the first data row as header)

Is it a bug?

Clément_Stenac · ‎11-06-2017

Hi,

It looks like an issue with your reading code. I would suggest that you print the column names and the data in the df after using read_excel or read_csv, before writing it to a dataset.

tmgiang · ‎11-06-2017

Actually I copy the code into my jupyter notebook and it still work well.

Back to your comment, I try to print the columns names after read, and it prints corrects columns's name. 🙂 But when I explore the data in table view, it is still missing the first row (the column name)

Sign up to take part

Python recipe missing first row (header)

Python recipe missing first row (header)

Labels