0 votes
Hi Team,

I have dataset with 10,000 rows and dataset has month column range between JAN till DEC. I want to split this by Month.

I can do this by Visual Recipe "SPLIT" but I have to create 12 different dataset for this.

My question, if I want the DSS to create Dataset according to distinct values of month column  then how can I do this ?




1 Answer

0 votes

This is a good use case for partitioning: https://doc.dataiku.com/dss/latest/partitions/index.html

Instead of creating 12 datasets using a split recipe, you can use the sync recipe with a partitioned output dataset. Then in the Settings > Connection menu of the output dataset, configure the partitioning column containing your month. Try discrete partition type if your months are encoded like "1" to "12" or time range partition type if they are encoded like dates ("YYYY-MM").

In the settings of your sync recipe, make sure you click on "Redispatch partitioning according to input columns". Then you will be able to build your selected partitions.
I don't see  "Redispatch partitioning according to input columns" in dss 5.1 .Any update on this?
Hi, In order for that option to appear on a Sync recipe, you first need to partition your output dataset by the dimension you want.
1,299 questions
1,327 answers
11,867 users

©Dataiku 2012-2018 - Privacy Policy