+1 vote
Hi

I'm wondering if there is a way to use something like a Sync or Export recipe to help provide a kind off roll back / undo facility for an output dataset? This function although applied to data at the end of a flow would need to run first rather than last in the flow. ie

1.Step 1 - take a copy of the output data from the last time the flow ran

2.Step 2 - run the flow to add / replace output data

3. optional step 3 - Manually copy step 1 back into place if needed to rerun the job

I suppose this is more of a database feature but I was trying to see if this could be achieved in the Flow somehow (or in another Flow?).

Thanks
by

1 Answer

0 votes
Hi Darren,

One option is to use Sync Recipes (you can use one to copy the input and other at the end (a copy of the output), you can also use Sync for step 2 (you can set Sync as replace or append). If you rerun you can decide what you want to build (an option is to Force Build, with which all the flow would be executed).

Another option is to use scenarios. With them you can set your three steps:

1. A Python code step that duplicates the output dataset (you can also copy do it manually doing a copy of a dataset with the user interface).

2. A Build/Train step that builds the flow (here you can also use Python code if you want to do some complex checks, but you can use a normal python code recipe in the flow).

3. Use the Export option to "copy and paste" the backup dataset (generated with Step 1) to another dataset (here you can do it with python again, to do it automatically).

If you need more information about something let me know :)
by
Hi Alan

Thanks for this, I've created my first scenario with 2 steps, 1st creates the "backup" of the old output, 2nd builds the new output.  I can "roll back" if required by exporting the backup to overwrite the output. This does what I need - simple!
Perfect :D
Scenarios have got a lot of potential!
1,319 questions
1,339 answers
1,539 comments
11,888 users

©Dataiku 2012-2018 - Privacy Policy