0 votes

Looks like an error was introduced in 5.0.1 (it worked in 5.0.0) that prevents dataiku.core.saved_model.Predictor.predict from working properly because it raises an error with Pandas  

ValueError: If using all scalar values, you must pass an index

Full Error message

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in predict(self, df, with_input_cols, with_prediction, with_probas, with_conditional_outputs, with_proba_percentile)
    591                 column_types[k] = np.object
    592         pred_df = self._get_prediction_dataframe(dates_handled.astype(column_types), with_prediction, with_probas, with_conditional_outputs,
--> 593                                                  with_proba_percentile)
    594         if with_input_cols:
    595             return pd.concat([df, pred_df], axis=1)

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in _get_prediction_dataframe(self, input_df, with_prediction, with_probas, with_conditional_outputs, with_proba_percentile)
    456                                   with_conditional_outputs, with_proba_percentile):
    457         if self.params.model_type == "PREDICTION":
--> 458             pred_df = self._prediction_type_dataframe(input_df, with_prediction, with_probas)
    459             self._add_percentiles_and_condoutputs(pred_df, with_proba_percentile, with_conditional_outputs)
    460             return pred_df

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in _prediction_type_dataframe(self, input_df, with_prediction, with_probas)
    488         if prediction_type == "REGRESSION":
    489             if with_prediction:
--> 490                 pred_df = pd.DataFrame({"prediction": self._clf.predict(X)[0]})
    491             else:
    492                 raise ValueError("Predicting a regression model with with_prediction=False. Oops.")

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5494     # figure out the index, if necessary
   5495     if index is None:
-> 5496         index = extract_index(arrays)
   5497     else:
   5498         index = _ensure_index(index)

/home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/core/frame.pyc in extract_index(data)
   5534         if not indexes and not raw_lengths:
-> 5535             raise ValueError('If using all scalar values, you must pass'
   5536                              ' an index')

ValueError: If using all scalar values, you must pass an index
Could you add some details (library list with versions) on the code environment you are using? Thanks!
It's the default Python2 environment that comes with 5.0.1
The error log you attached mentions this path: /home/dataiku/dss/condaenv/. Is that the python used in PATH? You can check this if you have Shell access to the server by running "which python". Then, can you  test from a Jupyter notebook the results of  pandas.__version__ ? (After the import pandas line). You can paste the output of these to this comment thread.
import pandas as pd

print(pd.__version__) => 0.20.3
print(pd.__file__) => /home/dataiku/dss/condaenv/lib/python2.7/site-packages/pandas/__init__.pyc
Thanks. Were you also able to run 'which python' on the shell of your server? What is the result?
This is the result when I ran "!which python" in the Jupyter notebook

But it looks like the kernel is the Python2 kernel from Dataiku
OK thanks for the details. Could you give me the line of code you wrote that produced the error? I will try to reproduce in a similar setup.
This is what I ran:

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error

model = dataiku.Model('Prediction_with_lagged', 'MARKETING_FORECAST')
regr = model.get_predictor()
ds_future = dataiku.Dataset("marketing-spend-conversion-date_prepared_shifted_future")
df_future = ds_future.get_dataframe()
regr.predict(df_future) ==> this last line gave the error
Your code is running OK in Dataiku 5.0.1 using the built-in Python environment on my side. Have you tried to predict using this model using a Score recipe in your flow? Do you get the same error?
The model ran without any issue with the Score recipe. It was when I called the predict() method on the obtained regressor that I encountered the error.
BTW I noticed that this line changed from 5.0.0 to 5.0.1 and was where I encountered the error:

/home/dataiku/dataiku-dss-5.0.1/python/dataiku/core/saved_model.pyc in _prediction_type_dataframe(self, input_df, with_prediction, with_probas)
    488         if prediction_type == "REGRESSION":
    489             if with_prediction:
--> 490                 pred_df = pd.DataFrame({"prediction": self._clf.predict(X)[0]})

1 Answer

0 votes

Thank you for your feedback !

We could reproduce the issue. It will be fixed it in the forthcoming 5.0.2 release.

Best regards
1,188 questions
1,220 answers
11,751 users

┬ęDataiku 2012-2018 - Privacy Policy