0 votes

Hey,

I recently upgraded dataiku to 4.2.0 and now the scoring of my ML models has the following issue:

[07:52:46] [INFO] [dku.utils]  - 2018-10-30 07:52:46,007 INFO Will do preparation, output schema: {u'userModified': False, u'columns': [{u'timestampNoTzAsDate': False, u'type': u'string', u'name': u'Domain', u'maxLength': -1}, {u'timestampNoTzAsDate': False, u'type': u'string', u'name': u'DescriptionProcessed', u'maxLength': -1}, {u'timestampNoTzAsDate': False, u'type': u'double', u'name': u'AgTech & New Food', u'maxLength': -1}]}
[07:52:46] [INFO] [dku.utils]  - Traceback (most recent call last):
[07:52:46] [INFO] [dku.utils]  -   File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
[07:52:46] [INFO] [dku.utils]  -     "__main__", fname, loader, pkg_name)
[07:52:46] [INFO] [dku.utils]  -   File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
[07:52:46] [INFO] [dku.utils]  -     exec code in run_globals
[07:52:46] [INFO] [dku.utils]  -   File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/prediction/reg_evaluation_recipe.py", line 302, in <module>
[07:52:46] [INFO] [dku.utils]  -     dkujson.load_from_filepath(sys.argv[8]))
[07:52:46] [INFO] [dku.utils]  -   File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/prediction/reg_evaluation_recipe.py", line 45, in main
[07:52:46] [INFO] [dku.utils]  -     pipeline = preprocessing_handler.build_preprocessing_pipeline(with_target=True)
[07:52:46] [INFO] [dku.utils]  -   File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/preprocessing_handler.py", line 170, in build_preprocessing_pipeline
[07:52:46] [INFO] [dku.utils]  -     pipeline = PreprocessingPipeline(steps=list(self.preprocessing_steps(*args, **kwargs)))
[07:52:46] [INFO] [dku.utils]  -   File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/preprocessing_handler.py", line 728, in preprocessing_steps
[07:52:46] [INFO] [dku.utils]  -     if with_target and self.sample_weight_variable is not None:
[07:52:46] [INFO] [dku.utils]  -   File "/home/dataiku/dataiku-dss-4.2.0/python/dataiku/doctor/preprocessing_handler.py", line 711, in sample_weight_variable
[07:52:46] [INFO] [dku.utils]  -     return self.core_params["weight"].get("sampleWeightVariable", None)
[07:52:46] [INFO] [dku.utils]  - KeyError: 'weight'
[07:52:46] [INFO] [dku.flow.activity] - Run thread failed for activity evaluate_on_CB_descriptions_AgTechFood_NP
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276)
	at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:40)
	at com.dataiku.dip.analysis.ml.prediction.flow.EvaluationRecipeRunner.run(EvaluationRecipeRunner.java:174)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352)
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - activity is finished
[07:52:46] [ERROR] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276)
	at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:40)
	at com.dataiku.dip.analysis.ml.prediction.flow.EvaluationRecipeRunner.run(EvaluationRecipeRunner.java:174)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352)
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Executing default post-activity lifecycle hook
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Removing samples for CLUSTERREPORTCLUSTERINGNEW.CB_desc_AgTechFood_m
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Removing samples for CLUSTERREPORTCLUSTERINGNEW.CB_desc_AgTechFood_s
[07:52:46] [INFO] [dku.flow.activity] running evaluate_on_CB_descriptions_AgTechFood_NP - Done post-activity tasks

 

Does anyone know what the exact issue is?

Cheers,

Matthew

asked by

1 Answer

0 votes
Hi Matt,

As Dataiku 4.2 is a major release, models trained with prior versions of DSS should be retrained when upgrading to 4.2.

Please find more on: https://doc.dataiku.com/dss/latest/release_notes/4.2.html#limitations-and-warnings

Retraining your deployed model in the flow should fix your issue :)

Hope it helps,

Alex
answered by
971 questions
998 answers
1,047 comments
2,361 users

┬ęDataiku 2012-2018 - Privacy Policy