Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
spark tasks in a single spark job,
if I run single spark recipe it runs as per the rule of block size i.e creating one task for 128 mb of block.
but if i run same spark job with spark pipeline it runs only 8/9 tasks (not more than this) no matter how big the cluster i choose, this information is noted from spark ui (we have 20 nodes of cluster but spark pipeline uses only 2 nodes meanwhile if we run same job without pipeline it uses whole cluster)
spark pipeline (spark ui) image
Spark single recipe :
As seen from above images, while running recipes in spark pipeline it runs only 8/9 tasks while in normal spark recipe it uses whole cluster according to data size and block size