Data Preparation Split

fornanthu
Level 2
Data Preparation Split
Hi Team,

I have dataset with 10,000 rows and dataset has month column range between JAN till DEC. I want to split this by Month.

I can do this by Visual Recipe "SPLIT" but I have to create 12 different dataset for this.

My question, if I want the DSS to create Dataset according to distinct values of month column then how can I do this ?



Regards,

Nantha.
5 Replies
Alex_Combessie
Dataiker Alumni
Hi,

This is a good use case for partitioning: https://doc.dataiku.com/dss/latest/partitions/index.html

Instead of creating 12 datasets using a split recipe, you can use the sync recipe with a partitioned output dataset. Then in the Settings > Connection menu of the output dataset, configure the partitioning column containing your month. Try discrete partition type if your months are encoded like "1" to "12" or time range partition type if they are encoded like dates ("YYYY-MM").

In the settings of your sync recipe, make sure you click on "Redispatch partitioning according to input columns". Then you will be able to build your selected partitions.
n0thing233
Level 3
I don't see "Redispatch partitioning according to input columns" in dss 5.1 .Any update on this?
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi, In order for that option to appear on a Sync recipe, you first need to partition your output dataset by the dimension you want.
0 Kudos
moranbuying
Level 1
I have a similar need, but I'd like to partition on the value of a text column. I have 10000s records and ~50 record categories that I'd like to use as partitions. In the field to add a partitioning pattern, inside dataset settings, I don't see a way to look at a single column or break things up by discrete text values. Can you offer some advice?
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi, I suggest you try this tutorial: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html
0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku