Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes
Hello, may you please provide any example where there is partitioning of the dataset based on the discrete values in one column? Everything that I can find on the web is using partitioning based on the time dimension.

1 Answer

0 votes
You may want to partition a dataset by country for instance, or by customer.
I have multiple devices, and I want to partition dataset by Device_ID which is one column in my dataset. But when I go Settings -> Partitioning -> Add discrete dimension, and I put Device_ID in the name field, no partitions exctrated. So my question is how to provide correct pattern in that case?
It depends on the type of your dataset : most file-based dataset will partition by folder, while SQL datasets will partition by the value of a certain column. See for more information.
If you have a file dataset that has a column from which you want to do partitions, you can use a Sync recipe to a new partitioned dataset with the "redispatch partition according to column" option enabled.
Unfortunately that is not helpful, because you need to sync on the existing dataset to perform that. I want to create my partitioning for the first time. And yes, my dataset is file-based one.
That is still the solution, here is a step-by-step guide:
Instead of the Year time dimension, add a discrete dimension. Don't forget to insert it in the partitioning pattern. E.g. if the dimension is called "device", the partitioning pattern for the output dataset should look like "%{device}/.*".
did you get your issue resolved? I think I have the same issue.
I'm not sure what issue you're referring to. Did you try using redispatch on a sync recipe, as suggested above?
Problem solved. Thank you. It is just because I have to do "redispatch" first before I want to list partitions.
1,337 questions
1,364 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy