Argument List too Long error which is independent on the recipe.

UserBird · ‎07-27-2017

I have a Filesystem datasource which is contains thousands of folders and each folder contains a list of comma separated files. Each file in each directory contains a different schema and the file name criteria is used to create partitioned data sources with the following using the following format:

/%{DIR_NAME}/KEY_%{DIR_NAME}.csv

This creates a datasource based on all the files that start with KEY in its name. That part is working as expected. My problem is that I can't do any recipe against that data source. I tried python, shell and sync recipes and all of the failed with the same error:


	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at com.dataiku.dip.security.process.RegularProcess.start(RegularProcess.java:47)
	at com.dataiku.dip.security.process.InsecureProcessesLaunchService.launch(InsecureProcessesLaunchService.java:34)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:263)
	at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:231)
	at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:37)
	at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:49)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:353)
Caused by: java.io.IOException: error=7, Argument list too long
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)

My current recipe is in python and the code is:

# -*- coding: utf-8 -*-

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu

# Recipe inputs

print("Here")

events_CSV = dataiku.Dataset("KEY_CSV")

events_CSV_df = events_CSV.get_dataframe()

# Recipe outputs

events_ORC = dataiku.Dataset("KEY_ORC")

events_ORC.write_with_schema(events_CSV_df)

Job fails before printing "Here".

These are the DSS instance settings:


{u'dipInstanceId': u'8bu1n1os-203c299d56c99ef078a53a1a81b6ea23-c60f6bab8e57ecd615a8ec240207f819', u'features': {u'TWITTER': {}, u'HADOOP': {}, u'HIVE': {}, u'PIG': {}, u'R': {}, u'SPARK': {}}, u'devInstance': False, u'distribVersion': u'7.3', u'debug': False, u'version': {u'product_commitid': u'', u'conf_version': u'16', u'product_version': u'4.0.5'}, u'distrib': u'redhat'}

andrewjobel · ‎03-01-2018

Long Path Tool is a software that will let you easily delete, copy or rename long path files.

Argument List too Long error which is independent on the recipe.

Argument List too Long error which is independent on the recipe.

Labels

Flow

Recipes

Troubleshooting

Sign up to take part

Argument List too Long error which is independent on the recipe.

Argument List too Long error which is independent on the recipe.

Labels

Flow

Recipes

Troubleshooting