Partition - Discrete dimension example

mpalangetic · ‎04-26-2018

Hello, may you please provide any example where there is partitioning of the dataset based on the discrete values in one column? Everything that I can find on the web is using partitioning based on the time dimension.

AdrienL · ‎04-26-2018

You may want to partition a dataset by country for instance, or by customer.

mpalangetic · ‎04-26-2018

I have multiple devices, and I want to partition dataset by Device_ID which is one column in my dataset. But when I go Settings -> Partitioning -> Add discrete dimension, and I put Device_ID in the name field, no partitions exctrated. So my question is how to provide correct pattern in that case?

AdrienL · ‎04-26-2018

It depends on the type of your dataset : most file-based dataset will partition by folder, while SQL datasets will partition by the value of a certain column. See https://doc.dataiku.com/dss/latest/partitions/index.html for more information.
If you have a file dataset that has a column from which you want to do partitions, you can use a Sync recipe to a new partitioned dataset with the "redispatch partition according to column" option enabled.

mpalangetic · ‎04-26-2018

Unfortunately that is not helpful, because you need to sync on the existing dataset to perform that. I want to create my partitioning for the first time. And yes, my dataset is file-based one.

AdrienL · ‎04-26-2018

That is still the solution, here is a step-by-step guide: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html
Instead of the Year time dimension, add a discrete dimension. Don't forget to insert it in the partitioning pattern. E.g. if the dimension is called "device", the partitioning pattern for the output dataset should look like "%{device}/.*".

n0thing233 · ‎07-03-2019

did you get your issue resolved? I think I have the same issue.

AdrienL · ‎07-03-2019

I'm not sure what issue you're referring to. Did you try using redispatch on a sync recipe, as suggested above?

n0thing233 · ‎07-04-2019

Problem solved. Thank you. It is just because I have to do "redispatch" first before I want to list partitions.
Thanks.

Sign up to take part

Partition - Discrete dimension example

Partition - Discrete dimension example

Labels

Partitioning