Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes

Hi,

I have a data set with three binary target variables. For each prediction, I use a dedicated classifier.

However, in order to not overwrite the prediction columns after applying each model, it seems to me that I need to rename those columns (e.g. probability --> probability_TARGET1, prediction --> prediction_TARGET1) before I continue with the next model.

In the flow, this turns out to be rather complicated:Is there a way around this?

e.g. Automatically rename output columns of the model (did not find suitable option in the "Score" recipe) or the option to use multiple classifieres in one recipe?

Thanks in advance.

Best,

Benjamin

by

1 Answer

0 votes

Hi bkmyt,

Thanks for your question. The answer really depends on whether (or to what extent) your predictions are chained (ie you're able to predict an attribute only after another attribute has been predicted). For simplicity, let's assume that you're looking to train two models on two attributes coming from a single dataset (but the explanation scales to more!).

The two scenarios will be 1) both attributes are independently and are combined at the end of the flow and 2) where we need to predict one attribute first and using this prediction, predict the second attribute.

1) Create two analysis and models on the same datasets using a different target variable, then score and combine the outputs.

 

2) Create the two models independently (but when scoring the second model, use the prediction of the first one.

As you point out in your question, in case 2), when there is a dependency, the flow will appear busier. However, as you can appreciate, the nature of the problem is more complex, hence some of the complexity being visible in the flow itself is expected.

Regarding renaming the output inside the score recipe, this is a good suggestion. As you can see, I have a prepare recipe between m1 pred and m1 pred clean in order to rename the field. This step is arguably not needed and could be handled by the scoring recipe that generates m1 pred.

I hope this helps!

 

 

 

 

by
1,337 questions
1,362 answers
1,556 comments
11,912 users

©Dataiku 2012-2018 - Privacy Policy