0 votes

Hi there,

I am running Dataiku 5.1.0 and in the release notes of 5.1.0 it says:


  • It is now possible to use datasets in a Python or R recipe, even if they are not declared as inputs or outputs


Now I tried using this today on a Python 3.6 environment. In the Jupyter notebook, this seems to work just fine, but as soon as I saved it back to recipe and ran the recipte itself, it immediately gave me the error:

Job failed: Error in Python process: At line 21: <class 'Exception'>: Dataset jira_projects cannot be used : declare it as input or output of your recipe


Apparently it isn't functioning as expected.

reopened by

1 Answer

0 votes
Best answer

You need to add ignore_flow=True in the constructor of the Dataset() class
selected by
Thanks, this solved it.

Where could I have found this information?
You can find reference documentation on our Dataiku API Dataset class here: https://doc.dataiku.com/dss/latest/python-api/datasets.html#dataiku.Dataset
How is this solved, if you are running earlier DSS (5.0.2)?

We are getting the same python exception.  Specifically, a dataset created by me in Project-A is being used in a recipe in Project-B ... this dataset IS INCLUDED as an input to the python recipe, but we are getting:

Job failed: Error in Python process: At line 21: <class 'Exception'>: Dataset <my_dataset> cannot be used : declare it as input or output of your recipe
This option was added in version 5.1 so you will need to upgrade to use it.

My question is -- what workaround did user's find before this feature was added?

Our IT will eventually upgrade DSS, but I am looking for an immediate workaround.

Can anyone say that there is no other (pre 5.1.0) workaround?

The error says "add the dataset as an input" ... which would seem to be a workaround -- but that is not working.   Has anyone successfully "added dataset as input"?

Is the fact that the my dataset is in another project a factor?  If so, the brute force work around is to copy the dataset between projects, and then use the "local project copy" as an input ... but I don't want to provide these advice, if it simply won't work.

Is the dataiku.dataset module source code available?   If so, maybe I could create a local code snipet until our DSS was upgraded.

The recommended way to use datasets from other projects is to use the "Share" feature: https://doc.dataiku.com/dss/latest/security/exposed-objects.html#exposing-objects-between-projects. Once you have shared the datasets from project A to project B, you can add them to your Python recipes on project B as you would for a dataset within project B. There is only a slightly different syntax: Dataset("<PROJECT_A_KEY>.dataset_name"). This is preferred to doing a copy of the datasets across project as:
1. you avoid duplicating the data
2. shared datasets point to the same location so are always in sync
3. you maintain the full lineage of data across project (which you would lose if you do not declare a dataset as input in the recipe)
This applies to 5.1 and before.
1,319 questions
1,339 answers
11,888 users

©Dataiku 2012-2018 - Privacy Policy