Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Running same set of recipes multiple times in parallel with different parameters

Solved!
pratikgujral-sf
Level 2
Running same set of recipes multiple times in parallel with different parameters

Hi Community,

In my dataset, I have a categorical column called customer_segment with 10 different possible values. 

I wish to train 10 different models- one for each customer_segment using filtered records only for that particular segment. We have a data preparation recipe, which is just Python code. As each customer_segment is independent of the other, we want to be able to run the steps for data preparation, model training, and evaluation for each customer_segment in parallel, by passing a different value of customer_segment to the recipes each time. 

Furthermore, for ease of maintenance, we do not wish to create 10 copies of the same code- for data preparation, training, and evaluation. 

Is it possible to do so with a Flow? 

I'm attaching a sample flow for illustrative purposes to help explain my question.

 

Sample Flow added for illustrative purposes.Sample Flow added for illustrative purposes.


Operating system used: Red Hat Enterprise Linux

0 Kudos
1 Solution
AlexT
Dataiker

Hi @pratikgujral-sf ,
If I understand your requirements, a partitioned model would essentially do what you are looking for.
You would partition the input dataset and train the partitioned model, the partition being customer_segment https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html

If you wish to bundle flow and make it re-usable you can also look at app-as-recipe. 
https://doc.dataiku.com/dss/8.0/applications/application-as-recipe.html

Thanks

View solution in original post

1 Reply
AlexT
Dataiker

Hi @pratikgujral-sf ,
If I understand your requirements, a partitioned model would essentially do what you are looking for.
You would partition the input dataset and train the partitioned model, the partition being customer_segment https://doc.dataiku.com/dss/latest/machine-learning/partitioned.html

If you wish to bundle flow and make it re-usable you can also look at app-as-recipe. 
https://doc.dataiku.com/dss/8.0/applications/application-as-recipe.html

Thanks

Labels

?

Setup info

?
A banner prompting to get Dataiku