Memory Error during Score recipe

UserBird · ‎07-17-2017

I have a dataset (250k rows) that I need to predict labels for. However, whenever I run the Score recipe I run into a Memory Error. Is there a way to batch score datasets?

Clément_Stenac · ‎07-17-2017

Hi,

Scoring is already done in small batches. What amount of memory do you have on your machine ? How much free memory before running the recipe ? How many columns in the dataset ? What kind of processing (ie, are you using hashing, count vectorization or tfidf for example ?)

Wuser92 · ‎07-17-2017

I have 8 GBs of memory available. 1GB is still free on the machine, pretty much all of the other 7GB are used by DSS. The input dataset has 8 columns, but I apply a TF-IDF vectorization on a column containing lots of tags. The "algorithm" tab in the model view says after pre-processing there are 1016 columns. Estimated memory usage is 94MB (for training only, I guess).
The training works perfectly with 25k rows, but the scoring on the 250k rows fails.

Here's the traceback also:
[11:57:18] [INFO] [dku.utils] - Traceback (most recent call last):
[11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
[11:57:18] [INFO] [dku.utils] - "__main__", fname, loader, pkg_name)
[11:57:18] [INFO] [dku.utils] - File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
[11:57:18] [INFO] [dku.utils] - exec code in run_globals
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 146, in
[11:57:18] [INFO] [dku.utils] - json.load_from_filepath(sys.argv[7]))
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 133, in main
[11:57:18] [INFO] [dku.utils] - for output_df in output_generator():
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/reg_scoring_recipe.py", line 78, in output_generator
[11:57:18] [INFO] [dku.utils] - output_probas=recipe_desc["outputProbabilities"])
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/classification_scoring.py", line 197, in binary_classification_predict
[11:57:18] [INFO] [dku.utils] - (pred_df, proba_df) = binary_classification_predict_ex(clf, modeling_params, target_map, threshold, transformed, output_probas)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/prediction/classification_scoring.py", line 148, in binary_classification_predict_ex
[11:57:18] [INFO] [dku.utils] - features_X_df = features_X.as_dataframe()
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dataiku-dss-4.0.3/python/dataiku/doctor/multiframe.py", line 253, in as_dataframe
[11:57:18] [INFO] [dku.utils] - return pd.concat(blockvals, axis=1)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 846, in concat
[11:57:18] [INFO] [dku.utils] - return op.get_result()
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 1038, in get_result
[11:57:18] [INFO] [dku.utils] - copy=self.copy)
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4545, in concatenate_block_managers
[11:57:18] [INFO] [dku.utils] - for placement, join_units in concat_plan]
[11:57:18] [INFO] [dku.utils] - File "/home/dataiku/dss/pyenv/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4648, in concatenate_join_units
[11:57:18] [INFO] [dku.utils] - concat_values = concat_values.copy()
[11:57:18] [INFO] [dku.utils] - MemoryError
[11:57:18] [INFO] [dku.flow.activity] - Run thread failed for activity score_Companies_unlabelled_AI_prepared_NP

Memory Error during Score recipe

Memory Error during Score recipe

Labels

Machine Learning

Visual recipes

Sign up to take part

Memory Error during Score recipe

Memory Error during Score recipe

Labels

Machine Learning

Visual recipes