Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I would like to put the partition value (used in the recipe) to the filter.
But I get
Hi @tomas. What is exactly your use case?
If you are using partitions, when running a recipe over a partitioned dataset, the data included in the recipe (to get distinct rows in your case) will have already been 'filtered' to only keep the date of the partition that is being run.
Or in other words, when you run a recipe over a partitioned dataset, the process will always include a filter 'date_partition = ${DKU_DST_DATE}' for all the dates included in your partition selection.
So I wonder what is exactly the use case that you want to solve.
Yes I know, but imagine a dataset where is a string column (YYYY-MM-DD format) containing date values is part of the data. But the dataset is NOT partitioned, data is in one or more parquet files, no partition structure in folders. And the visual recipe is doing group by and aggregation into a partitioned table. So and you want to process every single partition value (DAY) in a such way that dataiku takes only the given day, aggregates it and stores into a particular partition.
Then I was in fact making a wrong assumption, as I thought the input dataset was also partitioned.
However, even if the source data is not partitioned into folders, you can create the dataset and manually set the partition in dataiku, maybe that could help?
Cheers!
Hey,
Same problem like Tomas, in input I have an hive database not partitioned and I would like to put the partition value (used in the recipe) to the filter.
Have you any update for this topic?
PS: My input table is large and i can not partitioned this one
PS2 : In impala recipe, it's working
Update, it's working.
Thank you for sharing this update with the rest of the Community, @Paw!