How does Dataiku handle scheduling of containers

jax79sg · ‎11-21-2019

Hi,

Say i have a Dataiku pipeline and there are 3 nodes running in parallel in the pipeline. Each node is performing some actions that requires a GPU each, this node is performed by submitting as a container job to Kubernetes cluster.

Now while the nodes are initializing, one of them gets a FailedScheduling from Kubernetes cluster due to lack of GPU resources, can i configure Dataiku such that the pipeline doesn't fails? Rather, i would want the job in this node to be queued and somehow allow the user to see that this is happening.

Thank you.

Regards,

Jax

Mattsco · ‎11-22-2019

Hi,

There is a way to add limitations on parallel jobs in DSS.

If you check in Administration > Settings > Flow build:

In the additional limits, you can add a limit for tag/gpu to 2 for instance.

Then on the recipes from the flow, you can add the gpu tag on the recipes using gpus.

If you do that, it would mean that you can't start more than 2 recipes using gpus at the same time in DSS.

Matt

Mattsco

jax79sg · ‎11-22-2019

Hi,

Is there a way to start and queue it? Or the user will need to try and retry till a GPU is made available?

Thank you.

Regards,
Kah Siong

Mattsco · ‎11-22-2019

We don't have "add to queue" functionality but if you set these additionals limits in the settings and you start the job, it will be in a waiting mode and it's going to start only when it's ready: meaning less than X recipes are currently running with the tag parameters.

Mattsco

Mattsco · ‎11-22-2019

Actually, a second option would be to add several steps in the scenario to build each dataset sequentially. In a scenario each steps are running one after another so you can control in which order you want to update the full pipeline.

Mattsco

Sign up to take part

How does Dataiku handle scheduling of containers

How does Dataiku handle scheduling of containers

Labels

Kubernetes