0 votes

Hi,

I get this error message when training a classification model with MLLib

[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  - java.lang.IllegalArgumentException: No rows in train dataframe after target remap. Target empty? Type mismatch?
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply$mcV$sp(MLLibPredictionJob.scala:216)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.ProgressListener.push(ProgressListener.scala:46)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionJob$class.prepare(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.prepare(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob$delayedInit$body.apply(MLLibPredictionDoctorJob.scala:72)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.SuicidalApp$$anonfun$delayedInit$1.apply$mcV$sp(package.scala:402)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.collection.immutable.List.foreach(List.scala:318)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at scala.App$class.main(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.main(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at com.dataiku.dip.spark.MLLibPredictionDoctorJob.main(MLLibPredictionDoctorJob.scala)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at java.lang.reflect.Method.invoke(Method.java:497)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:710)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils]  -      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

What is the problem ?

asked by

1 Answer

+1 vote
Hi,

Assuming that your target column is indeed properly filled, the most probable cause is a "boolean normalization mismatch".

If your target column has "boolean" storage type (beware, it's storage type, not meaning, see : https://doc.dataiku.com/dss/4.0/schemas/), then for mllib to work properly, it MUST contain "true" and "false" as values.

In other words, for a mllib target, if the storage type is boolean, values like "0" or "1" are not supported.

When reading CSV files, DSS supports more than just "true" and "false", it supports things like 0, 1, yes, no, ... But mllib doesn't support this. You can force DSS to convert all "non-real-boolean" values to "real-boolean" values by checking the "Normalize booleans" checkbox in the dataset format settings.
answered by
974 questions
1,002 answers
1,049 comments
2,415 users

©Dataiku 2012-2018 - Privacy Policy