0 votes
Hello, may you please provide any example where there is partitioning of the dataset based on the discrete values in one column? Everything that I can find on the web is using partitioning based on the time dimension.
asked by

1 Answer

0 votes
You may want to partition a dataset by country for instance, or by customer.
answered by
I have multiple devices, and I want to partition dataset by Device_ID which is one column in my dataset. But when I go Settings -> Partitioning -> Add discrete dimension, and I put Device_ID in the name field, no partitions exctrated. So my question is how to provide correct pattern in that case?
It depends on the type of your dataset : most file-based dataset will partition by folder, while SQL datasets will partition by the value of a certain column. See https://doc.dataiku.com/dss/latest/partitions/index.html for more information.
If you have a file dataset that has a column from which you want to do partitions, you can use a Sync recipe to a new partitioned dataset with the "redispatch partition according to column" option enabled.
Unfortunately that is not helpful, because you need to sync on the existing dataset to perform that. I want to create my partitioning for the first time. And yes, my dataset is file-based one.
That is still the solution, here is a step-by-step guide: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html
Instead of the Year time dimension, add a discrete dimension. Don't forget to insert it in the partitioning pattern. E.g. if the dimension is called "device", the partitioning pattern for the output dataset should look like "%{device}/.*".
893 questions
923 answers
905 comments
1,433 users