Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes


I am training a classification model on a dataset which has 3612 rows. However, when i looked at the stats after training the model , the  Train Set had 3752 rows and Test set had 896 rows which is much higher than the rows in original dataset. I wanted to know what could have caused this to happen. Can you please help me find the reason for this problem


I used the default settings for model in DSS. 

Thank you


If you go to the "Status" tab of the dataset, make sure that you display the "Count of records" metric, and click "Compute", how many records does it see ?

1 Answer

0 votes

When i followed your steps, i got 171 column counts and 3612 record counts.

Could you:
  * Retrain your model. In the pre-train modal, make sure to check the "recompute splits" checkbox
  * If the problem persists, generate a diagnostic report (Administration > Maintenance > Diagnostic tool)
  * Send it to [email protected] (If the file is above 15 MB, you can use WeTransfer or a similar service)

I could not find the "recompute splits" checkbox. Can you please point me where that checkbox is .
Sorry, it's called "Drop existing sets, recompute new ones" - in the "Training models" modal that appears when you clikc on "Train"
I followed your steps and clicked  "Drop existing sets, recompute new ones" but it still gives the same result where the train set has 3752 rows and test has 896 rows..
How do we generate diagnostic reports for the model we trained in DSS labs. I could not find any option in the page where it generates diagnostic reports for our model. Can you help me with this

1,337 questions
1,364 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy