Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have a Filesystem datasource which is contains thousands of folders and each folder contains a list of comma separated files. Each file in each directory contains a different schema and the file name criteria is used to create partitioned data sources with the following using the following format:
/%{DIR_NAME}/KEY_%{DIR_NAME}.csv
This creates a datasource based on all the files that start with KEY in its name. That part is working as expected. My problem is that I can't do any recipe against that data source. I tried python, shell and sync recipes and all of the failed with the same error:
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.dataiku.dip.security.process.RegularProcess.start(RegularProcess.java:47)
at com.dataiku.dip.security.process.InsecureProcessesLaunchService.launch(InsecureProcessesLaunchService.java:34)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:263)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:37)
at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
Caused by: java.io.IOException: error=7, Argument list too long
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
My current recipe is in python and the code is:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Recipe inputs
print("Here")
events_CSV = dataiku.Dataset("KEY_CSV")
events_CSV_df = events_CSV.get_dataframe()
# Recipe outputs
events_ORC = dataiku.Dataset("KEY_ORC")
events_ORC.write_with_schema(events_CSV_df)
Job fails before printing "Here".
These are the DSS instance settings:
{u'dipInstanceId': u'8bu1n1os-203c299d56c99ef078a53a1a81b6ea23-c60f6bab8e57ecd615a8ec240207f819', u'features': {u'TWITTER': {}, u'HADOOP': {}, u'HIVE': {}, u'PIG': {}, u'R': {}, u'SPARK': {}}, u'devInstance': False, u'distribVersion': u'7.3', u'debug': False, u'version': {u'product_commitid': u'', u'conf_version': u'16', u'product_version': u'4.0.5'}, u'distrib': u'redhat'}
Long Path Tool is a software that will let you easily delete, copy or rename long path files.