0 votes

Hi,

I am new in using Dataiku API. I tried some simple examples such as create dataset, delete dataset and so on. Also I found one of your examples that creates Python recipe and sets inputs and outputs

from dataikuapi import GroupingRecipeCreator
builder = GroupingRecipeCreator('test_group', project)
builder = builder.with_input("input_dataset_name")
builder = builder.with_new_output("output_dataset_name", "hdfs_managed", format_option_id="PARQUET_HIVE")
builder = builder.with_group_key("quantity") # the recipe is created with one grouping key
recipe = builder.build()

Basically object builder helps to create a recipe. But is there any way to run a recipe? Or is it only possible to run this in Dataiku manually?

asked by
Hi Povilas, indeed the core concept in Dataiku is building a dataset or a model, which may need to run several recipes. This is what is covered in the Build concept, which is a key component of Scenario. That is why running a recipe by itself is not part of our API. Could you detail the context and the goal you want to achieve in more details? That way we can advise on the general approach. Thanks, Alex
Hi Alex. Actually I don't have any use case or exact goal. I just started exploring this Dataiku feature and its capabilities. I was wondering if there is a possibility to do all the actions that are possible in Dataiku user interface using API commands. So running a recipe is just an example. However, maybe it is possible to run recipes/datasets or the project itself somehow? Scenarios or something else? Or maybe the idea of API is different?

1 Answer

0 votes

Hi Povilas,


The philosophy of running a flow of datasets, recipes and models in Dataiku revolves around the concept of Job and Scenario. In the API, you do not run a Recipe but rather build its output, either using a Job or a Scenario.

If you plan on using different elements of the API to create a Dataiku project, test it and automate it, I would advise:

1. Creating the datasets, recipes and models using https://doc.dataiku.com/dss/latest/publicapi/client-python/datasets.html, https://doc.dataiku.com/dss/latest/publicapi/client-python/recipes.html and https://doc.dataiku.com/dss/latest/publicapi/client-python/ml.html

2. Build/train some datasets/models by launching Jobs building the outputs(s) of the recipe: https://doc.dataiku.com/dss/latest/publicapi/client-python/jobs.html


3. Create a scenario to automate the update of datasets and models: https://doc.dataiku.com/dss/latest/publicapi/client-python/scenarios.html

In general, it may be faster to use the interface to initialize a "template project", including scenarios. Then copy this template several times with some programmatic changes using the API.

Hope it helps,

Alex

answered by
edited by
930 questions
957 answers
958 comments
1,809 users