Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes
I would like to score my own prediction method against the same test set that DSS generated to test a model. The test set being generated by a sampling method with randomization, it's quite tricky.

Is there a simple way to perform that ? Maybe by extracting the test set to a dataset on which I could do some analysis ?

Thanks !

1 Answer

+2 votes
Best answer

There are two solutions:

* Recommended: Split the dataset yourself and use the ability of the Analysis Models to use predefined train and test sets instead of letting it do a random split. At the moment, doing a random split using the split recipe is a bit tricky, you'd have to first create a random column with a Python processor in a preparation recipe

* Hackish / Not officially supported: When using memory-based models in DSS, the train and test sets are dumped as CSV files in the DSS datadir > analysis-data > project > analysis_id > model_id > splits
selected by
1,339 questions
1,365 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy