Joining Dataset

Solved!
Torito
Level 2
Joining Dataset

Good Day

How could i joined two datasets based on a primary key and with the most close date of the row related

Example in one dataset i have info related to maintenance of a equipment for example filter replacing.

And on the second one i have the readings of hours, strokes, etc but this data is not updated daily is updated randomly so i want to get the most close date between this two datasets for the joined data.

0 Kudos
1 Solution
tgb417

@Torito 

Welcome to the Dataiku community.

One of the ideas that came to mind as I was reading your post was to use the interpolation features of Dataiku that are usually used with time series data to regularize data that come in sporadically.  This may be way more than your use case requires.  But it is something that DSS can do and might help your create a dataset to better understand what is going on with your data.  The idea would be to calculate a daily value for your one dataset and then join it to the other.  So you would not need to pick the nearest date.  You would have an estimate of the values on that given date.


https://doc.dataiku.com/dss/latest/time-series/time-series-preparation/resampling.html

Just an idea.  

--Tom

View solution in original post

4 Replies
Torito
Level 2
Author

so you can add a nearest date option in the recipe seem to be working fine right now.

Turribeach

Use a Window recipe to get the data partitioned and filtered. On Window definitions set the partition columns to your primary key and order columns to the primary key and your date/time column descending on when the row was last updated. Then in aggregations enable Row number. Finally in Post-filter set a condition to filter by rownumber == 1 to only see the last row. 

tgb417

@Torito 

Welcome to the Dataiku community.

One of the ideas that came to mind as I was reading your post was to use the interpolation features of Dataiku that are usually used with time series data to regularize data that come in sporadically.  This may be way more than your use case requires.  But it is something that DSS can do and might help your create a dataset to better understand what is going on with your data.  The idea would be to calculate a daily value for your one dataset and then join it to the other.  So you would not need to pick the nearest date.  You would have an estimate of the values on that given date.


https://doc.dataiku.com/dss/latest/time-series/time-series-preparation/resampling.html

Just an idea.  

--Tom
Torito
Level 2
Author

thanks i will give both of the options a try to learn more for future projects thanks

0 Kudos