Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Compare two datasets column by column without using code recipe

Solved!
yesitsmeoffical
Level 3
Compare two datasets column by column without using code recipe

How to create dataset 3 (key = 'ID', join method = 'left') to compare datasets 1 and 2 WITHOUT using code recipe, i.e low code no code only. Thanks.

Dataset 1:

IDCol ACol BCol C
1AA11Z1
2BB100X2
3CC33Y5

 

Dataset 2:

IDCol ACol BCol C
1AA11Z1
2BB98X2
3CC33YY555
4DD44T6

 

Dataset 3 (string columns return true/false, numeric columns return ratio):

IDCol ACol BCol C
1TRUE100%TRUE
2TRUE98%TRUE
3TRUE100%FALSE

Operating system used: win 11

0 Kudos
1 Solution
SarinaS
Dataiker

Hi @yesitsmeoffical,

You can utilize the "pre-join computed columns" step within the join recipe to calculate your computed columns. From your description it sounds like you want to return TRUE if dataset 1 column A == dataset 2 column A and if dataset 1 column C == dataset 2 column C. And it looks like for column B you want to return dataset 2 col B / dataset 1 column B.  

I created the following output dataset using your sample inputs and three computed columns:

Screen Shot 2023-12-27 at 3.51.29 PM.png

The string computed columns have the formulas:

if(val("d1__Col A") == val("d2__Col A"), 'TRUE', 'FALSE')
if(val('d1__Col C') == val('d2__Col C'), 'TRUE', 'FALSE')


The numeric computed column has the formula:

100 * (val('d2__Col B') / val('d1__Col B'))


I'm attaching an export of the project so that you can test it out by importing into your DSS instance. 

Thanks! 
Sarina

View solution in original post

1 Reply
SarinaS
Dataiker

Hi @yesitsmeoffical,

You can utilize the "pre-join computed columns" step within the join recipe to calculate your computed columns. From your description it sounds like you want to return TRUE if dataset 1 column A == dataset 2 column A and if dataset 1 column C == dataset 2 column C. And it looks like for column B you want to return dataset 2 col B / dataset 1 column B.  

I created the following output dataset using your sample inputs and three computed columns:

Screen Shot 2023-12-27 at 3.51.29 PM.png

The string computed columns have the formulas:

if(val("d1__Col A") == val("d2__Col A"), 'TRUE', 'FALSE')
if(val('d1__Col C') == val('d2__Col C'), 'TRUE', 'FALSE')


The numeric computed column has the formula:

100 * (val('d2__Col B') / val('d1__Col B'))


I'm attaching an export of the project so that you can test it out by importing into your DSS instance. 

Thanks! 
Sarina