0 votes

I want to do the above in the free version of DSS (v4.0.5). I created a Filesystem dataset and pointed it at the folder containing my csv input files. All the csv files have the same schema. However, when I create the dataset it only appears to 'see' one of the csv files. So when I run my flow it only processes one of them. But I want to process all the files in order (e.g., by alphabetical order of the input file names for example), feeding the data from each file into my custom code one file's-worth at a time .

Is there any way I can do this without having to write my custom code so that it opens the folder and processes the files in a loop? (E.g., a bit like at https://answers.dataiku.com/1347/read-csvs-from-a-folder)

Or should I be using the 'Files in Folder' dataset instead (https://doc.dataiku.com/dss/latest/connecting/files-in-folder.html )? I previously discounted that because it seemed to be only for data that DSS cannot read. But it seems silly not to exploit DSS's built-in ability to read in csv files.

1 Answer

0 votes
Best answer
If you use a folder you will need to read files one by one in a loop, if you have a lot of files this is the right solution.

If you have a few files, you can upload them one by one, and use a stack recipe to merge all the created datasets into a single one.
selected by
Thanks @cperdigou. I wasn't aware of the Stack recipe (https://doc.dataiku.com/dss/latest/other_recipes/stack.html). I tried it but I think I'll go for the custom code, reading the files in via a loop, as that is more flexible and makes it easier to tell the original datasets apart. Thanks!
1,298 questions
1,326 answers
11,863 users

┬ęDataiku 2012-2018 - Privacy Policy