Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes

I am using HDFS datasets in my workflow which are updating on a daily basis and I would like to find out if these daily changes can be tracked by DSS and saved in a separate "delta" file through a scenario or some other automation capability.


1 Answer

0 votes

This needs some work but can be achieved using scenarios and partitioning.

You would have your "stock" dataset (not partitioned), and a changes dataset, partitioned by day. You will need to create a coding recipe that takes the change dataset both at input and output, but with a partition dependency that says

"to compute day N of the change dataset, I use the stock dataset and day N-1 of the change dataset" (use the "Time range" dependency)

Then your recipe does the actual computation

An important point is that you should not run this recipe in "recursive" mode, because this would recurse until the big bang (since to compute day N-1, you need day N-2 which needs day N-3, ...)

Then this can be automated using a time-based trigger, since you expect your files to change daily (note that this requires a professional version of DSS)
1,337 questions
1,362 answers
11,912 users

©Dataiku 2012-2018 - Privacy Policy