Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Creating a folder structure in Managed folder

Level 1
Creating a folder structure in Managed folder


I am working on a python recipe, the output of this recipe is connected to a Managed folder connected to Azure Data lake storage. In this Managed folder I want write the output of my python recipe such that whenever i run the python recipe the output file is stored in date wise folder structure. For example, If i run today it will store the output parquet file in folder structure such as 2021>06>29. Similarly, for tomorrow file should be saved in this folder structure 2021>06>30. 

As per this problem, if we want to have our output file saved in dynamic folder structure. Is there a way to do this in Dataiku?

0 Kudos
3 Replies


typically, in this kind of use cases you should partition the output folder (by day). Then the python recipe will get the "partition to build" (in that case, a day) as a variable, that you can use however you deem fit in the code. For creating a folder structure, you simply have to pass the subpath inside the folder to the uplpad_xxx() calls, like for example (here with csv):

import dataiku

# Read recipe inputs
kaggle_titanic_train = dataiku.Dataset("kaggle_titanic_train")
df = kaggle_titanic_train.get_dataframe()
data = df.to_csv().encode("utf8")

# Write recipe outputs
output_folder = dataiku.Folder("H5s2NLcx")
partition = dataiku.dku_flow_variables["DKU_DST_DATE"]
partition_root_path = output_folder.get_partition_folder(partition)
output_folder.upload_data(partition_root_path + "/data.csv", data)

Note that if you want to write parquet files to azure, as long as it's a storagev2 account that you can use abfs on, then it's probably simpler to create in DSS a azure dataset pointing to the desired location and write to the dataset, instead of writing to a managed folder.

0 Kudos
Level 1


I tried to test your code. But I am unable to access dku_flow_variables even though I am not running inside the notebook. I am building the recipe. 

It gives me error:

"Error in Python process: At line 35: <class 'KeyError'>: DKU_DST_DATE"


0 Kudos

flow_variables is indeed recipe-only. And my example uses a partitioned folder indeed, with settings like:

Screenshot 2021-06-29 at 11.51.48.png


0 Kudos