Can I have a dataset both as input and as output of a recipe (a kind of “Recursive recipe”)?

Solved!
UserBird
Dataiker
Can I have a dataset both as input and as output of a recipe (a kind of “Recursive recipe”)?
I'd like to write a recipe where one of the inputs is also an output. The loop exists intentionally: the data-set should get enriched every time the recipe is run and would converge.
1 Solution
jrouquie
Dataiker Alumni

It is not possible to have a dataset both as input and as output of a recipe (this would require the user to specify a convergence criterion, and a way to write to a dataset while reading from it).



But there is hope!




  • If the goal is to enrich a dataset, one should have two datasets: foo and foo_enriched

  • If it's about iterating until convergence, this can be done inside one recipe (for instance in a Python recipe), and have as output of the recipe the dataset after convergence.

  • If it's about updating a dataset on a regular basis (e.g. daily),  then partitioning might be the solution.

View solution in original post

0 Kudos
2 Replies
jrouquie
Dataiker Alumni

It is not possible to have a dataset both as input and as output of a recipe (this would require the user to specify a convergence criterion, and a way to write to a dataset while reading from it).



But there is hope!




  • If the goal is to enrich a dataset, one should have two datasets: foo and foo_enriched

  • If it's about iterating until convergence, this can be done inside one recipe (for instance in a Python recipe), and have as output of the recipe the dataset after convergence.

  • If it's about updating a dataset on a regular basis (e.g. daily),  then partitioning might be the solution.

0 Kudos
jereze
Community Manager
Community Manager

The answer given by jrouquie is correct. But there are also some (unofficial) hacks to work around:




  • Notebooks

  • Writing in files (I personally do it for caching API calls)

  • SQL

Jeremy, Product Manager at Dataiku
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku