+3 votes
I'd like to write a recipe where one of the inputs is also an output. The loop exists intentionally: the data-set should get enriched every time the recipe is run and would converge.

2 Answers

0 votes
Best answer

It is not possible to have a dataset both as input and as output of a recipe (this would require the user to specify a convergence criterion, and a way to write to a dataset while reading from it).

But there is hope!

  • If the goal is to enrich a dataset, one should have two datasets: foo and foo_enriched
  • If it's about iterating until convergence, this can be done inside one recipe (for instance in a Python recipe), and have as output of the recipe the dataset after convergence.
  • If it's about updating a dataset on a regular basis (e.g. daily),  then partitioning might be the solution.
edited by
0 votes

The answer given by jrouquie is correct. But there are also some (unofficial) hacks to work around:

  • Notebooks
  • Writing in files (I personally do it for caching API calls)
  • SQL
1,325 questions
1,345 answers
11,895 users

©Dataiku 2012-2018 - Privacy Policy