Comparing models with test data scores and selecting features

nsrishan · ‎08-01-2022

Hello,

I am currently running a flow to do a binary classification model. After running the model on my training data, I want to compare the results of the top three models on my test data set (accuracy, precision, recall, etc.). I know how to do it on the train dataset, but I am unsure on how to do it on the test data and compare it via model comparisons.

Also, after running my flow on the entire set of features, is there a way to only select the top 5 features to run a new model on?

Thank you!

AlexT · ‎08-15-2022

Hi,

To evaluate on the test dataset you would need to perform the split using the split recipe and then use explicit extracts for your train/test sets.

You can do this Visual Analysis > select the model > Design > Train/Test Set and choose "Explicit extracts from two datasets"

To reduce the number of features you can have a look at: https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#settings-feature-reduct...

Let me know if that helps.

sir · ‎03-06-2023

Hi,

But where do we see the results of the test dataset after splitting? How do we get the recall, precision, accuracy etc?

Sign up to take part

Comparing models with test data scores and selecting features

Comparing models with test data scores and selecting features

Setup info