Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes
Hi,

Say i have a Dataiku pipeline and there are 3 nodes running in parallel in the pipeline. Each node is performing some actions that requires a GPU each, this node is performed by submitting as a container job to Kubernetes cluster.

Now while the nodes are initializing, one of them gets a FailedScheduling from Kubernetes cluster due to lack of GPU resources, can i configure Dataiku such that the pipeline doesn't fails? Rather, i would want the job in this node to be queued and somehow allow the user to see that this is happening.

Thank you.

Regards,

Jax
by

1 Answer

0 votes

Hi, 

There is a way to add limitations on parallel jobs in DSS. 

If you check in Administration > Settings > Flow build: 

In the additional limits, you can add a limit for tag/gpu to 2 for instance. 
Then on the recipes from the flow, you can add the gpu tag on the recipes using gpus.

If you do that, it would mean that you can't start more than 2 recipes using gpus at the same time in DSS. 

Matt

 

by
Hi,

Is there a way to start and queue it? Or the user will need to try and retry till a GPU is made available?

Thank you.

Regards,
Kah Siong
We don't have "add to queue" functionality but if you set these additionals limits in the settings and you start the job, it will be in a waiting mode and it's going to start only when it's ready: meaning less than X recipes are currently running with the tag parameters.
Actually, a second option would be to add several steps in the scenario to build each dataset sequentially. In a scenario each steps are running one after another so you can control in which order you want to update the full pipeline.
1,337 questions
1,362 answers
1,556 comments
11,912 users

©Dataiku 2012-2018 - Privacy Policy