0 votes
I have a recipe which scores customers everyday. I want to export the scores in a csv-file everyday to our hdfs.

However for now each time the recipe runs:

- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)

- The output file is in format  (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...)  on 'Type'.


1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
edited by

1 Answer

0 votes
The simplest way to obtain a CSV is to download the dataset, which is then consolidated in a single CSV file.

However if I understand your question correctly, you want to automatically export a single-CSV-file HDFS dataset. Unfortunately this is not yet natively supported, you'd need to add a Shell recipe to consolidate those files manually afterwards (and add the extension). The `.csv` extensions will be added in the upcoming 2.3 release of DSS.
Also, note that the out-s42* files are already in csv format. As compared to your request
- they are split into several files (due to parallel processing)
- they lack the ".csv" extension (upgrade to the soon to be released v2.3 to fix this)

But they should be readable by your downstream application.
1,322 questions
1,341 answers
11,889 users

©Dataiku 2012-2018 - Privacy Policy