0 votes

I have some code where I need to run an HDFS command in Python to check if a file is present.  See below for an example:


import subproces

command = 'hdfs dfs -ls /sandbox'
ssh = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).communicate()


When I run this in a Jupyter notebook in Dataiku, the command completes without any problems.  However, when I run the notebook as a Python recipe, I get the following error message multiple times:

java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)];

It looks as if there is a problem with Kerberos when I run the Jupyter notebook as a Recipe.  What is the reason for this?  Is there a Dataiku setting I can change to make sure the Kerberos ticket is generated properly?


2 Answers

0 votes

Your company is running Dataiku in Multi-User-Security mode. In this mode, Dataiku performs complex interaction with Kerberos in order to ensure that each activity runs as the end-user while Dataiku only has a single credential.

This interaction makes it so that the Python recipe does not have impersonated credentials. That being said, it is possible that a Pyspark recipe (rather than a vanilla Python one) would work (you don't need to actually do anything Spark-y in a Pyspark recipe)
0 votes
I can confirm, using PySpark in these cases are the solution. The impersonation is handled by Dataiku in this case, so you dont have to worry about keytabs, and do a kinit before the command (or to cron the kinit for the specific user)
1,198 questions
1,229 answers
11,758 users

©Dataiku 2012-2018 - Privacy Policy