+1 vote

Hi,

I'd like to write a formula that compares multiple columns and returns the most frequent word. I am aiming to aggregate multiple machine learning models to see if it improves accuracy. As an example, based on the image below the 1st row would return "Handling" as this is the most common word in the SVC, RandForest and LogisticRegression. The 2nd row would return "Handling - Operations" - ignore TicketRootCause as this is the real answer.

I have done this in excel with the formula below but can't find the functions in DSS. Any ideas of how I could do this? Either based off converting the excel function below into DSS or another method?
 

=INDEX(F2:N2,MODE(MATCH(F2:N2,F2:N2,0)))


Thanks,
Ollie

by

1 Answer

0 votes

Hi Ollie,

You probably have to add a "Python function" step with "Add a new cell for each row".

Then you can find the most common word with python code. For that you can find different options at https://stackoverflow.com/questions/48606406/find-most-frequent-value-in-python-dictionary-value-with-maximum-count (for me the best answer is using collections.Counter).

 

by
1,324 questions
1,345 answers
1,544 comments
11,895 users

©Dataiku 2012-2018 - Privacy Policy