0 votes
Hi,

I am new to DSS and am attempting to advance my knowledge through the tutorials. I got to the lecture 'create a scikit-learn (python) model', but when copying and pasting in the code that is to be output to the folder 'model_scikit', I'm receiving this error:

'Job failed : Error in Python process: <type 'exceptions.ValueError'>: min_samples_split must be at least 2 or in (0, 1], got 1'

When I change the value of min_samples_split (eg. [1.0, 3, 10], or even just [2]) I get a different error:

'Job failed : Error in Python process: <type 'exceptions.Exception'>: Dataset None cannot be used : declare it as input or output of your recipe'

Any ideas?

Thanks,

Clíona
asked by CDP

1 Answer

0 votes
Hi,

You have the right idea that the problem with the min_samples_split setting is that it's expecting a float value in (0.0,1.0], so it's rejecting the integer 1, so the line of code in Teachable should read:

    "min_samples_split": [1.0, 3, 10],

The error that you're getting after setting min_samples_split correctly suggests that your recipe input is setting "None" as the input dataset; that is, it looks something like this:

    # Recipe inputs
    df = dataiku.Dataset("None").get_dataframe()

rather than with "train" as the input dataset.
answered by
657 questions
655 answers
490 comments
414 users