Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes
What is the best way / workflow to read and write to the same dataset without running into any recursion issues?

For example, if you read and write to the same dataset somewhere in your flow, you can not build the final dataset recursively anymore.

Example Use Case: We want to read a dataset with URLs, scrape the URLs or make an API call for this URL and then flag the URL to avoid calling the same URL again in the future (as we already have the results).
I am not sure if you can do this exactly this way, I tried once and noticed the easiest way was either to create a new dataset or modify the parent recipe to do all the steps. I even have done this via notebooks, but it is not recursive, is it completely necessary for you to do this?

1 Answer

0 votes
An easy way to achieve this is to create two datasets (effectively, JSON docs) that point to the same location (file on disk, table in database). One for reading, one for writing. DSS will only look at the metadata (the JSON file) to decide what the dependencies are, and not at the actual data. This way, you avoid the circular dependency.
edited by
1,339 questions
1,365 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy