Argument List too Long error which is independent on the recipe.

UserBird
Dataiker
Argument List too Long error which is independent on the recipe.

I have a Filesystem datasource which is contains thousands of folders and each folder contains a list of comma separated files.  Each file in each directory contains a different schema and the file name criteria is used to create partitioned data sources with the following using the following format:



/%{DIR_NAME}/KEY_%{DIR_NAME}.csv



This creates a datasource based on all the files that start with KEY in its name.  That part is working as expected.  My problem is that I can't do any recipe against that data source.  I tried python, shell and sync recipes and all of the failed with the same error:




at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at com.dataiku.dip.security.process.RegularProcess.start(RegularProcess.java:47)
at com.dataiku.dip.security.process.InsecureProcessesLaunchService.launch(InsecureProcessesLaunchService.java:34)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:263)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:37)
at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:49)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
Caused by: java.io.IOException: error=7, Argument list too long
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)


My current recipe is in python and the code is:



# -*- coding: utf-8 -*-

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu



# Recipe inputs



print("Here")



events_CSV = dataiku.Dataset("KEY_CSV")

events_CSV_df = events_CSV.get_dataframe()



# Recipe outputs

events_ORC = dataiku.Dataset("KEY_ORC")

events_ORC.write_with_schema(events_CSV_df)



Job fails before printing "Here".



These are the DSS instance settings:




{u'dipInstanceId': u'8bu1n1os-203c299d56c99ef078a53a1a81b6ea23-c60f6bab8e57ecd615a8ec240207f819', u'features': {u'TWITTER': {}, u'HADOOP': {}, u'HIVE': {}, u'PIG': {}, u'R': {}, u'SPARK': {}}, u'devInstance': False, u'distribVersion': u'7.3', u'debug': False, u'version': {u'product_commitid': u'', u'conf_version': u'16', u'product_version': u'4.0.5'}, u'distrib': u'redhat'}


 

0 Kudos
1 Reply
andrewjobel
Level 1

Long Path Tool is a software that will let you easily delete, copy or rename long path files.

0 Kudos

Labels

?
Labels (3)
A banner prompting to get Dataiku