Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Aww.... what a wowzer! I can't wait to try to figure this one out!
I've taken a first go at breaking this down, the first job was to parse the date column into an actual date, then I broke out the individual date elements - day, month and year:
I selected the month column as my target, interestingly the automated setup initially suggested this should be a regression problem, which I then changed to multi-class classification.
I trained this first dataset using the Decision Tree, Logistic Regression and Random Forest Algorithms, with the later winning on ROC score:
Now for my favourite part of any modelling process, feature importance!
Suggesting that the max temperature is the best indicator for a month, makes sense, right? White interestingly rainfall is actually not a great indicator for month.
Here's a nice visualisation of the Random Forest split, starting with mean temperature:
Would anyone like to suggest some ways to refine this very simply start?
Ben