How to apply custom Python code to multiple csv files in a folder?

UserBird · ‎06-30-2017

I want to do the above in the free version of DSS (v4.0.5). I created a Filesystem dataset and pointed it at the folder containing my csv input files. All the csv files have the same schema. However, when I create the dataset it only appears to 'see' one of the csv files. So when I run my flow it only processes one of them. But I want to process all the files in order (e.g., by alphabetical order of the input file names for example), feeding the data from each file into my custom code one file's-worth at a time .

Is there any way I can do this without having to write my custom code so that it opens the folder and processes the files in a loop? (E.g., a bit like at https://answers.dataiku.com/1347/read-csvs-from-a-folder)

cperdigou · ‎07-04-2017

If you use a folder you will need to read files one by one in a loop, if you have a lot of files this is the right solution.

If you have a few files, you can upload them one by one, and use a stack recipe to merge all the created datasets into a single one.

View solution in original post

cperdigou · ‎07-04-2017

If you use a folder you will need to read files one by one in a loop, if you have a lot of files this is the right solution.

If you have a few files, you can upload them one by one, and use a stack recipe to merge all the created datasets into a single one.

UserBird · ‎07-04-2017

Thanks @cperdigou. I wasn't aware of the Stack recipe (https://doc.dataiku.com/dss/latest/other_recipes/stack.html). I tried it but I think I'll go for the custom code, reading the files in via a loop, as that is more flexible and makes it easier to tell the original datasets apart. Thanks!

How to apply custom Python code to multiple csv files in a folder?

How to apply custom Python code to multiple csv files in a folder?

Labels

code

Flow

Python

Sign up to take part

How to apply custom Python code to multiple csv files in a folder?

How to apply custom Python code to multiple csv files in a folder?

Labels

code

Flow

Python