Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
from time to time the scoring of a prediction model fails.
The only thing that seems to solve it is to train a new model and deploy a completely different algorithm.
Here is my log:
[09:23:38] [INFO] [dku.utils] - 2017-09-19 09:23:38,188 INFO Reading with dtypes: {u'Domain': None, u'DescriptionProcessed': 'str'}
[09:23:38] [INFO] [dku.utils] - 2017-09-19 09:23:38,188 INFO Column 0 = Domain (dtype=None)
[09:23:38] [INFO] [dku.utils] - 2017-09-19 09:23:38,188 INFO Column 1 = Source (dtype=None)
[09:23:38] [INFO] [dku.utils] - 2017-09-19 09:23:38,188 INFO Column 2 = DescriptionProcessed (dtype=str)
[09:23:38] [INFO] [dku.utils] - 2017-09-19 09:23:38,283 INFO Starting dataframes iterator
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,688 INFO Got a dataframe : (100000, 3)
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,688 INFO Coercion done
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,688 INFO NORMALIZED: Domain -> object
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,688 INFO NORMALIZED: Source -> object
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,688 INFO NORMALIZED: DescriptionProcessed -> object
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,689 INFO Processing it
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,697 INFO Set MF index len 100000
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,697 DEBUG PROCESS WITH Step:MultipleImputeMissingFromInput
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,697 DEBUG MIMIFI: Imputing with map {}
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,698 DEBUG PROCESS WITH Step:FlushDFBuilder(num_flagonly)
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,698 DEBUG PROCESS WITH Step:MultipleImputeMissingFromInput
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,698 DEBUG MIMIFI: Imputing with map {}
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,698 DEBUG PROCESS WITH Step:FlushDFBuilder(cat_flagpresence)
[09:23:41] [INFO] [dku.utils] - 2017-09-19 09:23:41,698 DEBUG PROCESS WITH <class 'dataiku.doctor.preprocessing.dataframe_preprocessing.TextTFIDFVectorizerProcessor'> (DescriptionProcessed)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 DEBUG PROCESS WITH Step:FlushDFBuilder(interaction)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 DEBUG PROCESS WITH Step:DumpPipelineState
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 INFO ********* Pipieline state (Before feature selection)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 INFO input_df= (100000, 3)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 INFO current_mf=(100000, 26558)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 INFO PPR:
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,616 DEBUG PROCESS WITH Step:EmitCurrentMFAsResult
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO Set MF index len 100000
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 DEBUG PROCESS WITH Step:DumpPipelineState
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO ********* Pipieline state (At end)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO input_df= (100000, 3)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO current_mf=(0, 0)
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO PPR:
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO TRAIN = <class 'dataiku.doctor.multiframe.MultiFrame'> ((100000, 26558))
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO Predicting it
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,625 INFO Prepare to predict ...
[09:24:02] [INFO] [com.dataiku.dip.dataflow.streaming.DatasetWritingService] - Init write session: tcenfKZdgi
[09:24:02] [DEBUG] [dku.jobs] - Command /tintercom/datasets/init-write-session processed in 14ms
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,679 INFO Initializing write data stream (tcenfKZdgi)
[09:24:02] [INFO] [dku.jobs] - Connects using API ticket
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,682 INFO Waiting for data to send ...
[09:24:02] [DEBUG] [dku.jobs] - Received command : /tintercom/datasets/wait-write-session
[09:24:02] [INFO] [dku.utils] - 2017-09-19 09:24:02,682 INFO Got end mark, ending send
[09:24:02] [INFO] [com.dataiku.dip.dataflow.streaming.DatasetWriter] - Creating output writer
[09:24:02] [INFO] [com.dataiku.dip.dataflow.streaming.DatasetWriter] - Initializing output writer
[09:24:02] [INFO] [dku.connections.sql.provider] - Connecting to jdbc:postgresql://dataiku.ct2brvwy8za8.us-east-1.rds.amazonaws.com:5432/Dataiku with props: {}
[09:24:02] [DEBUG] [dku.connections.sql.provider] - Driver version 9.0
[09:24:02] [INFO] [dku.connections.sql.provider] - Driver PostgreSQL Native Driver (JDBC 4.0) PostgreSQL 9.0 JDBC4 (build 801) (9.0)
[09:24:02] [INFO] [dku.connections.sql.provider] - Database PostgreSQL 9.5.4 (9.5) rowSize=1073741824 stmts=0
[09:24:02] [DEBUG] [dku.connections.sql.provider] - Set autocommit=false on conn=Dataiku_DB
[09:24:02] [INFO] [dku.sql.generic] - Dropping table
[09:24:02] [INFO] [dku.dataset.sql] - Executing statement:
[09:24:02] [INFO] [dku.dataset.sql] - DROP TABLE "CLUSTERREPORTCLUSTERINGNEW_cb_descriptions_consumerelectronics_classified"
[09:24:02] [INFO] [dku.dataset.sql] - Statement done
[09:24:02] [INFO] [dku.sql.generic] - Creating table
[09:24:02] [INFO] [dku.dataset.sql] - Executing statement:
[09:24:02] [INFO] [dku.dataset.sql] - CREATE TABLE "CLUSTERREPORTCLUSTERINGNEW_cb_descriptions_consumerelectronics_classified" (
"Domain" text,
"Source" text,
"DescriptionProcessed" text,
"proba_0.0" double precision,
"proba_1.0" double precision,
"prediction" double precision
)
[09:24:02] [INFO] [dku.dataset.sql] - Statement done
[09:24:02] [INFO] [com.dataiku.dip.dataflow.streaming.DatasetWriter] - Done initializing output writer
[09:24:02] [INFO] [dku.output.sql.pglike] - Copy done, copied 0 records
[09:24:02] [INFO] [dku.connections.sql.provider] - Commit conn=Dataiku_DB
[09:24:02] [DEBUG] [dku.connections.sql.provider] - Close conn=Dataiku_DB
[09:24:02] [INFO] [dku.output.sql.pglike] - Transaction done, copied 0 records
[09:24:02] [INFO] [com.dataiku.dip.dataflow.streaming.DatasetWritingService] - Pushed data to write session tcenfKZdgi : 0 rows
[09:24:02] [DEBUG] [dku.jobs] - Command /tintercom/datasets/push-data processed in 144ms
[09:24:02] [INFO] [com.dataiku.dip.dataflow.streaming.DatasetWritingService] - Finished write session: tcenfKZdgi
[09:24:02] [DEBUG] [dku.jobs] - Command /tintercom/datasets/wait-write-session processed in 146ms
[09:24:02] [INFO] [dku.utils] - 0 rows successfully written (tcenfKZdgi)
[09:24:02] [INFO] [dku.utils] - Traceback (most recent call last):
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dss/condaenv/lib/python2.7/runpy.py", line 174, in _run_module_as_main
[09:24:02] [INFO] [dku.utils] - "__main__", fname, loader, pkg_name)
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dss/condaenv/lib/python2.7/runpy.py", line 72, in _run_code
[09:24:02] [INFO] [dku.utils] - exec code in run_globals
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.8/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 146, in <module>
[09:24:02] [INFO] [dku.utils] - json.load_from_filepath(sys.argv[7]))
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.8/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 133, in main
[09:24:02] [INFO] [dku.utils] - for output_df in output_generator():
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.8/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 78, in output_generator
[09:24:02] [INFO] [dku.utils] - output_probas=recipe_desc["outputProbabilities"])
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.8/python/dataiku/doctor/prediction/classification_scoring.py", line 206, in binary_classification_predict
[09:24:02] [INFO] [dku.utils] - (pred_df, proba_df) = binary_classification_predict_ex(clf, modeling_params, target_map, threshold, transformed, output_probas)
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.8/python/dataiku/doctor/prediction/classification_scoring.py", line 157, in binary_classification_predict_ex
[09:24:02] [INFO] [dku.utils] - features_X_df = features_X.as_dataframe()
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.8/python/dataiku/doctor/multiframe.py", line 269, in as_dataframe
[09:24:02] [INFO] [dku.utils] - blkdf = pd.DataFrame(blk.matrix.toarray(), columns=blk.names)
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dss/condaenv/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 920, in toarray
[09:24:02] [INFO] [dku.utils] - return self.tocoo(copy=False).toarray(order=order, out=out)
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dss/condaenv/lib/python2.7/site-packages/scipy/sparse/coo.py", line 252, in toarray
[09:24:02] [INFO] [dku.utils] - B = self._process_toarray_args(order, out)
[09:24:02] [INFO] [dku.utils] - File "/home/dataiku/dss/condaenv/lib/python2.7/site-packages/scipy/sparse/base.py", line 1009, in _process_toarray_args
[09:24:02] [INFO] [dku.utils] - return np.zeros(self.shape, dtype=self.dtype, order=order)
[09:24:02] [INFO] [dku.utils] - MemoryError
[09:24:02] [INFO] [dku.flow.activity] - Run thread failed for activity score_CB_descriptions_final_without_labels_14_NP
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:311)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:47)
at com.dataiku.dip.analysis.ml.prediction.flow.PredictionScoringRecipeRunner.runOriginalPython(PredictionScoringRecipeRunner.java:388)
at com.dataiku.dip.analysis.ml.prediction.flow.PredictionScoringRecipeRunner.runWithOriginalEngine(PredictionScoringRecipeRunner.java:294)
at com.dataiku.dip.analysis.ml.prediction.flow.PredictionScoringRecipeRunner.run(PredictionScoringRecipeRunner.java:220)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
[09:24:03] [INFO] [dku.flow.activity] running score_CB_descriptions_final_without_labels_14_NP - activity is finished
[09:24:03] [ERROR] [dku.flow.activity] running score_CB_descriptions_final_without_labels_14_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:311)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:47)
at com.dataiku.dip.analysis.ml.prediction.flow.PredictionScoringRecipeRunner.runOriginalPython(PredictionScoringRecipeRunner.java:388)
at com.dataiku.dip.analysis.ml.prediction.flow.PredictionScoringRecipeRunner.runWithOriginalEngine(PredictionScoringRecipeRunner.java:294)
at com.dataiku.dip.analysis.ml.prediction.flow.PredictionScoringRecipeRunner.run(PredictionScoringRecipeRunner.java:220)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
I have wasted hours to create and re-create models.
Does anyone have a similar experience?