Partition - Discrete dimension example

mpalangetic
Level 1
Partition - Discrete dimension example
Hello, may you please provide any example where there is partitioning of the dataset based on the discrete values in one column? Everything that I can find on the web is using partitioning based on the time dimension.
0 Kudos
8 Replies
AdrienL
Dataiker
You may want to partition a dataset by country for instance, or by customer.
0 Kudos
mpalangetic
Level 1
Author
I have multiple devices, and I want to partition dataset by Device_ID which is one column in my dataset. But when I go Settings -> Partitioning -> Add discrete dimension, and I put Device_ID in the name field, no partitions exctrated. So my question is how to provide correct pattern in that case?
0 Kudos
AdrienL
Dataiker
It depends on the type of your dataset : most file-based dataset will partition by folder, while SQL datasets will partition by the value of a certain column. See https://doc.dataiku.com/dss/latest/partitions/index.html for more information.
If you have a file dataset that has a column from which you want to do partitions, you can use a Sync recipe to a new partitioned dataset with the "redispatch partition according to column" option enabled.
0 Kudos
mpalangetic
Level 1
Author
Unfortunately that is not helpful, because you need to sync on the existing dataset to perform that. I want to create my partitioning for the first time. And yes, my dataset is file-based one.
0 Kudos
AdrienL
Dataiker
That is still the solution, here is a step-by-step guide: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html
Instead of the Year time dimension, add a discrete dimension. Don't forget to insert it in the partitioning pattern. E.g. if the dimension is called "device", the partitioning pattern for the output dataset should look like "%{device}/.*".
0 Kudos
n0thing233
Level 3
did you get your issue resolved? I think I have the same issue.
0 Kudos
AdrienL
Dataiker
I'm not sure what issue you're referring to. Did you try using redispatch on a sync recipe, as suggested above?
0 Kudos
n0thing233
Level 3
Problem solved. Thank you. It is just because I have to do "redispatch" first before I want to list partitions.
Thanks.
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku