1 year ago

#190908

test-img

andream

Spark: I see more executors than available cluster's cores

I'm working with Spark and Yarn on an Azure HDInsight cluster, and I have some troubles on understanding the relations between the workers' resources, executors and containers.

My cluster has 10 workers D13 v2 (8 cores and 56GB of memery), therefore I should have 80 cores available for spark applications. However, when I try to start an application with the following parameters

"executorMemory": "6G",
"executorCores": 5,
"numExecutors": 20,

I see in the Yarn UI 100 cores available (therefore, 20 more than what I should have). I've run an heavy query, and on the executor page of Yarn UI I see all 20 executors working, with 4 or 5 active task in parallel. I tried also pushing the numExecutors to 25, and I do see all 25 working, again with several tasks in parallel for each executor.

It was my understanding the 1 executor core = 1 cluster core, but this is not compatible with what I observe. The official Microsoft documentation (for instance here) it's not really helpful. It states:

An Executor runs on the worker node and is responsible for the tasks for the application. The number of worker nodes and worker node size determines the number of executors, and executor sizes.

but it does not say what the relation is. I suspect Yarn is only bound by memory limits (e.g. I can run how many executors I want, if I have enough memory), but I don't understand how this might work in relation with the available cpus in the cluster.

Do you know what I am missing?

apache-spark

pyspark

hadoop-yarn

azure-hdinsight

executor

0 Answers

Your Answer

Accepted video resources