Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Keep only rows that has timestamp between start and stop columns from another dataset

Solved!
Fragan
Level 1
Keep only rows that has timestamp between start and stop columns from another dataset

Hi, im kinda new to DSS.



I have a huge Dataset (with 20M+ rows), this dataset has a timestamp column, and 20 other columns.



I have another Dataset with like 15 rows, this one has 3 columns start, stop and flag.



Foreach row in my first Dataset i want to check if the timestamp is between start and stop of a row in the second dataset, if it's i want to enrich the first dataset with the flag from the second. And if a row in my first Dataset isn't between any of the start and stop, i don't want to keep it.



I've done this with a Python recipe using Pandas, but im trying to do it only with DSS recipes.



I tried to do it with a Join recipe with this SQL code : 




$left.timestamp>=$right.start AND $left.timestamp<=$right.stop;


but it's not working..



I found one way but it takes too much disk space and time:  Do a cross join between my two Datasets and then do a filter with the value of start and stop



 



Can you guys suggest me a good way to do this ?

0 Kudos
1 Solution
Liev
Dataiker Alumni

Hi Fragan, 



Please use the following syntax on your join clause




"Test_dataset"."timestamp">="testa"."start" AND "Test_dataset"."timestamp"<="testa"."stop"


mind the quotes around the field names.



 

View solution in original post

0 Kudos
1 Reply
Liev
Dataiker Alumni

Hi Fragan, 



Please use the following syntax on your join clause




"Test_dataset"."timestamp">="testa"."start" AND "Test_dataset"."timestamp"<="testa"."stop"


mind the quotes around the field names.



 

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku