Academy
- Join the Academy Benefit from guided learning opportunities →
Community
Documentation
- Reference Documentation Comprehensive specifications of Dataiku →
Knowledge
- Knowledge Base Articles and tutorials on Dataiku features →
Developer
- Developer Guide Tutorials and articles for developers and coder users →
For You

Sign up to take part

Registered users can ask their own questions, contribute to discussions, and be part of the Community!

Learn more

Community
»
Discussions
»
Using Dataiku
»

Options

Subscribe to RSS Feed
Mark Topic as New
Mark Topic as Read
Float this Topic for Current User
Bookmark
Subscribe
Mute
Printer Friendly Page

How to sync a partitioned dataset only for partitions not in the output

Alex_Combessie

Dataiker Alumni

‎05-13-2016 12:21 AM

Mark as New
Bookmark
Subscribe
Mute
Subscribe to RSS Feed
Permalink
Print
Report Inappropriate Content

How to sync a partitioned dataset only for partitions not in the output

I have a sync recipe with one partitioned dataset in input and one partitioned dataset in output. Partitioning is by hour.

The input dataset receives new data continuously. Today I manually build the output recipe by selecting new dates, using the append instead of overwrite options.

This is obviously not optimal, as it involves manual intervention.

What would be a solution to only sync the partition from the input that are not in the output? (other than job scheduling, which could be too costly)

0 Kudos