Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes
I did Clustering with K-MEANS model and I wish to understand how the variables importance percentages in the histogram are calculated? what does it measure?


1 Answer

+1 vote
Best answer
We fit a simple random forest supervised model to the output classes of the kmeans. This allows us to derive variable importances, as per the random forest standard method (implemented in scikit-learn).
selected by
I can see a feature with 10%,  another is 5%. What is the meaning of % in variable importances?
We use the definition of variable importance in percentage from the random forest model in scikit-learn.
1,337 questions
1,362 answers
11,912 users

©Dataiku 2012-2018 - Privacy Policy