Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
we are merging our spark scala code with spark pipeline, however if i run the code step individually it runs fine (in both function mode / free mode) but when it is been merged in spark pipeline it give error
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace:, caused by: ClassNotFoundException: CustomScalaRecipe$$anonfun$1
scala code:
import org.apache.spark.sql.functions._
// Recipe inputs
val iups_prepared_joined_by_time_cell_joined = inputDatasets("iups_prepared_joined_by_time_cell_joined")
def redaction(data:String, d_cnt: Int): String = {
var data_trim = "" ;
if (data.isEmpty || data =="" || data == null){
return data ;
}else {
data_trim = data.trim() ;
}
if (d_cnt == 1){
var data_length = data_trim.length();
return data_trim.replaceAll(data_trim, "*"*data_length)
}
else if (d_cnt> 5){
return data_trim
}
else {
val redac_length = 6-d_cnt
val length_r = Seq(redac_length, data_trim.length()).min
return data_trim.slice(0,data_trim.length()-length_r) + "*"*(length_r+3)
}
}
val redaction_udf = udf((data: String,d_cnt:Int) => redaction(data,d_cnt));
val iups_pre_anonymized = iups_prepared_joined_by_time_cell_joined.withColumn("price_plan", when(col("price_plan").isNull, lit(null)).otherwise(redaction_udf(col("price_plan"),col("price_plan_distinct"))))
// Recipe outputs
Map("iups_pre_anonymized" -> iups_pre_anonymized)
Hello,
Thanks for reporting this, it seems to be a bug in the way spark pipelines and spark serialization work, I'll check to see if we can fix that in a future DSS version.
In the meantime, you can workaround it by either:
Best
Hello,
Thanks for reporting this, it seems to be a bug in the way spark pipelines and spark serialization work, I'll check to see if we can fix that in a future DSS version.
In the meantime, you can workaround it by either:
Best
Hello Dataiker!
Are any solutions available since 2019? I'm using dataiku 7.0.2 and have that problem of ClassNotFoundException: CustomScalaRecipe$$anonfun$1
Hi @bkostya I can confirm that this original issue was was fixed in version 5.1.5. Do you the same distinction between "it runs fine individually" but "it fails in a pipeline"?
It fails in a recipe (pipeline). The CustomScalaRecipe caught my eye. Please look at that ticket.