We confirm that all performance metrics shown in DSS are based on the test set - we currently never show performance on the train test - in the case of KFold, it's the mean of out-of-fold (so test set too)
So you do see a downwards trend when going from "reasonable" to "very deep" random forest (from 0.95 to 0.892) which is indeed probably indicative of overfitting, although it's not as severe as you expected - possibly because: (a) Your train and test sets are very similar (b) The random picking of features adds sufficient diversity to counteract parts of the overfitting effect - it could also happen if you don't have that much data, which means that your trees are not "full"