Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Properly implement support for Building Flow Zones in Scenarios and the Dataiku API

In Dataiku v12.0.0 a new feature was added that allows users to build flow zones from the flow UI:

https://knowledge.dataiku.com/latest/data-preparation/pipelines/tutorial-build-modes.html#build-a-fl...

This works well however this capability was never added properly to Scenarios and to the Dataiku API. In 12.1.0 Dataiku added the following feature:

  • Added “stop at flow zone boundary” option when building multiple datasets at once.

This is how the Scenario Build steps looks in v12.6.0:

 

Screenshot 2024-05-02 162720.png

The new "Stop at zone boundary" option allows to restrict the flow build to a single zone. This new option is similar to option shown when using the Build button in flow Zone:

Screenshot 2024-05-02 163135.png

However there is one major pitfall in the scenario functionality. When using the build flow Zone from the Flow button you don't have to specify any specific datasets and you can build the whole flow zone. In the scenario case you must specify your "end" / "output" datasets in a build step and select the "Stop at zone boundary" option. This means that if you forget an end dataset or subsequently add a new output dataset to your flow zone the scenario step will need to be updated. Furthermore it forces the user to have to select each and every output dataset in the scenario build step. 

So this idea is to enhance Dataiku to properly implement support for Building Flow Zones in Scenarios and the Dataiku API. For the avoidance of any doubt this is what is expected of this idea:

  1. A new Scenario step option to Build a Zone. The input for this stepo should be only the Zone Name and the Build mode. There should be no need to specify any datasets. The step should always build all the datasets in the zone as per the Build mode.
  2. A new Dataiku Python API to Build a Zone. The input for this stepo should be only the Zone Name and the Build mode. The API call should always build all the datasets in the zone as per the Build mode.