+1 vote
Hi

I have a flow that I want to run multiple times (manually triggered), each run requires a new source file.  I'd like to load the files into a folder (file_A, file_X, file_c etc) and then kick off the flow to automatically run through each file in turn, so run_1 uses file_A and once complete it starts again with run_2 using file_X etc etc

What is the best way to achieve this?  I was thinking uploading all the files to a folder then passing in a list of file names (either in a csv, or editable dataset or pasted into a script) and for each file name the flow would run and look for that file in the folder - maybe passing each file name in turn as a variable? - I just don't know how to achieve this or if there is a better way.

Any advice (with code tips if possible) would be greatly appreciated.

Thank you
by

1 Answer

0 votes

Hi Darren, 

This looks like a good use case for the partitioning features of DSS. 

Partitioning allows you to run the same recipes in parallel specifying the partitions you want to run. 

In your case you would have a folder with the files: 

And from the settings of the folder you can define a partitioning pattern: 

Then if you create a recipe with the output partitions by the same dimension you can run the recipes in parallel: 

You can check the documentation on this topic, it's pretty advanced! 

https://doc.dataiku.com/dss/latest/partitions/index.html

 

Matt

by
Hi Matt

Thanks for this detailed reply, I'll take a look at partioning to see if this will help - does look advanced for me at this time.  You mentioned "parallel" in this solution but I believe I need to run these sequentially as the output from the first run with File_A is then used as one of the inputs for the next run eg File_B (I'm identifying changes between the subsequent files) apologies for not labouring this in the question.  Do you have any additional ideas?  Thanks
So for that I would just import file_A and file_B as 2 datasets, do stuff on the first dataset and maybe join it with the second one.
Sorry it's difficult to understand the use case from my side.
If you want to give me more details you can contact me at [email protected]
1,319 questions
1,339 answers
1,539 comments
11,888 users

©Dataiku 2012-2018 - Privacy Policy