No rows in train dataframe after target remap. Target empty? Type mismatch?

UserBird
Dataiker
No rows in train dataframe after target remap. Target empty? Type mismatch?

Hi,



I get this error message when training a classification model with MLLib




[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - java.lang.IllegalArgumentException: No rows in train dataframe after target remap. Target empty? Type mismatch?
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply$mcV$sp(MLLibPredictionJob.scala:216)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$$anonfun$prepare$1.apply(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.ProgressListener.push(ProgressListener.scala:46)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionJob$class.prepare(MLLibPredictionJob.scala:212)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.prepare(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob$delayedInit$body.apply(MLLibPredictionDoctorJob.scala:72)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.SuicidalApp$$anonfun$delayedInit$1.apply$mcV$sp(package.scala:402)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.App$$anonfun$main$1.apply(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.collection.immutable.List.foreach(List.scala:318)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at scala.App$class.main(App.scala:71)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob$.main(MLLibPredictionDoctorJob.scala:20)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at com.dataiku.dip.spark.MLLibPredictionDoctorJob.main(MLLibPredictionDoctorJob.scala)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at java.lang.reflect.Method.invoke(Method.java:497)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:710)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
[2017/03/21-09:19:30.848] [Exec-282] [INFO] [dku.utils] - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


What is the problem ?

0 Kudos
1 Reply
Clément_Stenac
Dataiker
Hi,

Assuming that your target column is indeed properly filled, the most probable cause is a "boolean normalization mismatch".

If your target column has "boolean" storage type (beware, it's storage type, not meaning, see : https://doc.dataiku.com/dss/4.0/schemas/), then for mllib to work properly, it MUST contain "true" and "false" as values.

In other words, for a mllib target, if the storage type is boolean, values like "0" or "1" are not supported.

When reading CSV files, DSS supports more than just "true" and "false", it supports things like 0, 1, yes, no, ... But mllib doesn't support this. You can force DSS to convert all "non-real-boolean" values to "real-boolean" values by checking the "Normalize booleans" checkbox in the dataset format settings.

Labels

?
Labels (2)
A banner prompting to get Dataiku