0 votes
I have a recipe which scores customers everyday. I want to export the scores in a csv-file everyday to our hdfs.

However for now each time the recipe runs:

- The output files is splitted in severel smaller files (out-s0-c0, out-s0-c1,out-s0-c2....)

- The output file is in format  (out-s0-c0, out-s0-c1,out-s0-c2....) ==> even though I set 'Seperated values (CSV,TSV...)  on 'Type'.

 

Question:
1. How do I get my export in 1 file instead of different files (out-s0-c0, out-s0-c1,out-s0-c2....)
2. How do I get the format in csv-format and not the out-s0-c....
asked by
edited by

1 Answer

0 votes
The simplest way to obtain a CSV is to download the dataset, which is then consolidated in a single CSV file.

However if I understand your question correctly, you want to automatically export a single-CSV-file HDFS dataset. Unfortunately this is not yet natively supported, you'd need to add a Shell recipe to consolidate those files manually afterwards (and add the extension). The `.csv` extensions will be added in the upcoming 2.3 release of DSS.
answered by
Also, note that the out-s42* files are already in csv format. As compared to your request
- they are split into several files (due to parallel processing)
- they lack the ".csv" extension (upgrade to the soon to be released v2.3 to fix this)

But they should be readable by your downstream application.
792 questions
816 answers
720 comments
533 users