0 votes
Is there a way to keep the column descriptions along the pipeline? If I add column descriptions at the beginning of the pipeline, it seems like I need to add them again for every output along the pipeline. Is this the case or am I doing something wrong?

asked by

1 Answer

0 votes

When you add a column description at the beginning of a pipeline, that's a change to the schema of that dataset.  That schema change needs to be propagated to all datasets along the pipeline: https://answers.dataiku.com/1237/is-there-a-way-to-propagate-schema-changes-in-a-whole-flow
answered by
Thanks Alex, I've tried propagating it before, but all the schema checks say everything is already propagated. Even dropping and deleting the schema of the output datasets doesn't help. The only way I can get it to work is to replace every recipe along the flow with a new one and manually add all the steps to the new recipes. It seems like somehow the column descriptions don't make it into the existing recipe, as even copying an existing recipe also removes the column descriptions.
After I propagate the schema changes, I do a smart reconstruction build of the final dataset in the pipeline, and then I see the descriptions added in the first dataset.
Even a smart or forced rebuild doesn't work in my case. I'm adding the descriptions in the middle of the flow. Do the descriptions need to be at the very beginning of the flow?

Here's some screenshots:
soep_selected input dataset with column descriptions (right after adding them in the visual recipe): https://snag.gy/I5RFHV.jpg
Flow with all schemas propagated: https://snag.gy/w95LRC.jpg
soep_cleaned output dataset missing the descriptions: https://snag.gy/WZxuiH.jpg

I'm using Dataiku Version 4.2.0.
891 questions
920 answers
1,387 users