Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I try some Python code in recipe to read csv / excel file from html and local csv. The code link below:
read from html
link_down = link_share + '/download'
df_proj_list = pandas.read_excel(link_down, skiprows=1) #1 empty row
# Recipe outputs
digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")
digi_PAY_Master_Data_original.write_with_schema(df_proj_list)
But when explore the result, the header row was missing, and it uses the first data row as header instead.
I try this codes in my jupyter notebook and it work well.
So I try again, by using local jupyter notebook to read the file from url, then export to a csv file, then use Dataiku recipe to read the exported file
The code in Dataiku recipe:
read from csv
olap_folder = '/home/giangtm/Work/Projects/DataScience/olap/'
file_master_data = 'DIGI-PAY_Master_Data.csv'
df_proj_list = pandas.read_csv(olap_folder + file_master_data)
# Recipe outputs
digi_PAY_Master_Data_original = dataiku.Dataset("DIGI-PAY_Master_Data_original")
digi_PAY_Master_Data_original.write_with_schema(df_proj_list)
But the output data was missing the header again 🙂 (means that the header was lost, and it use the first data row as header)
Is it a bug?