Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi DSS,
I want to input 2 different datasets: dataset A of about 19.000 rows and 18 columns; and dataset B of about 4.000.000 rows and same columns as dataset A, that contains all row info from column A plus lots of additional info. All columns of A and B are the same, with column 1 containing patient codes.
I want to assign 1 in a new column (in the B dataset) for all column 1 cells of A that exist in column 1 of B; and assign 0 in the same new column for all cells that exist only in column 1 B.
Is it possible to do this with a visual recipe?
Leo
Operating system used: Windows
in reading your note here, I’m wondering if the following approach might work for you.
Note this approach does have some risk of duplicating rows overall in your final set. If you have more than 1 row in the smaller set that might match a record in larger set. If this is a possibility you might first use a distinct recipe on the smaller set to make sure that the items you are matching on are unique. This should resolve this issue.
Have fun with your project. Let us all know how you are getting on with your analysis.
in reading your note here, I’m wondering if the following approach might work for you.
Note this approach does have some risk of duplicating rows overall in your final set. If you have more than 1 row in the smaller set that might match a record in larger set. If this is a possibility you might first use a distinct recipe on the smaller set to make sure that the items you are matching on are unique. This should resolve this issue.
Have fun with your project. Let us all know how you are getting on with your analysis.
When working with dates and times in DSS you need to parse the dates and times to a standard format. This is not done automatically like MS Excel might do. You can do this date and time parsing in a visual prepare recipe before you do your join.
this means you will need two of these prepare recipes. One for the large dataset. One fir the small dataset.
give it a try. Let the community know how you get on with this, so that others finding this thread might learn as well.
@tgb417 ,
Thank you Tom. Parsing the dates worked fine. I will certainly post my final processing view on the topic to help others.
I have one last issue though. I did the left join but, I don't see my columns from the second dataset being joined. I am not sure why. I show the process in the recipe and the columns of my first dataset on the left (there are no columns from the second dataset on the right).
Leo
Just to reply to myself, I noticed that I did not ask to generate all columns for the right dataset. After editing it, it worked.
Thats great news! Congratulations.
And thank you for sharing with other community members.