Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Spark Performance warning: Input dataset is read in a non-distributed way

Alwaleed
Level 1
Spark Performance warning: Input dataset is read in a non-distributed way

I Setting up the Spark using K8s but I'm facing the Performance warning: Input dataset is read in a non-distributed way and this make the queries taking so much time 

attached  a screenshot need your support to avoid the problem image (5).png

0 Kudos
1 Reply
AlexT
Dataiker

Hi @Alwaleed ,
The warning simply means that Spark can't read the SQL dataset directly. So it will need to use DSS to read the data which will be slower because it's not distributed. 

If you want to enable direct spark reads for compatible database you need to enable the option at the connection level :

Screenshot 2024-05-02 at 12.37.26โ€ฏPM.png

Then you will need to enable on the dataset level Screenshot 2024-05-02 at 12.39.21โ€ฏPM.png

0 Kudos