Q & A
Dataiku is for…
Governance & Security
Learn Dataiku DSS
Q & A
Ask a Question
Email or Username
I forgot my password
Can a Recipe in a Flow be used to help 'backup' the data to assist with reruns?
I'm wondering if there is a way to use something like a Sync or Export recipe to help provide a kind off roll back / undo facility for an output dataset? This function although applied to data at the end of a flow would need to run first rather than last in the flow. ie
1.Step 1 - take a copy of the output data from the last time the flow ran
2.Step 2 - run the flow to add / replace output data
3. optional step 3 - Manually copy step 1 back into place if needed to rerun the job
I suppose this is more of a database feature but I was trying to see if this could be achieved in the Flow somehow (or in another Flow?).
to add a comment.
to answer this question.
One option is to use Sync Recipes (you can use one to copy the input and other at the end (a copy of the output), you can also use Sync for step 2 (you can set Sync as replace or append). If you rerun you can decide what you want to build (an option is to Force Build, with which all the flow would be executed).
Another option is to use scenarios. With them you can set your three steps:
1. A Python code step that duplicates the output dataset (you can also copy do it manually doing a copy of a dataset with the user interface).
2. A Build/Train step that builds the flow (here you can also use Python code if you want to do some complex checks, but you can use a normal python code recipe in the flow).
3. Use the Export option to "copy and paste" the backup dataset (generated with Step 1) to another dataset (here you can do it with python again, to do it automatically).
If you need more information about something let me know :)
ask related question
Thanks for this, I've created my first scenario with 2 steps, 1st creates the "backup" of the old output, 2nd builds the new output. I can "roll back" if required by exporting the backup to overwrite the output. This does what I need - simple!
Scenarios have got a lot of potential!
to add a comment.
Is it possible to create custom/tailormade recipes that can be used across different projects?
How can I create a materialized view in DSS and use it in the flow ?
Multiple outputs in a single Python recipe are only writing data from the first dataset
How can use partitioning to append the results of my flow every day?
Can an end user without a dataiku license upload excel files to use them as a dataset and then run a flow?
We’re working on a brand new, revamped Community experience. Want to receive updates?
Sign up now!
©Dataiku 2012-2018 -