ExecutorRunnable

ExecutorRunnable starts a YARN container with CoarseGrainedExecutorBackend. If external shuffle service is used, it is set in the ContainerLaunchContext context as a service under the name of spark_shuffle.

Note	Despite the name `ExecutorRunnable` is not a java.lang.Runnable anymore after SPARK-12447.

Tip	Enable `INFO` logging level for `org.apache.spark.deploy.yarn.ExecutorRunnable` logger to see what happens inside. Add the following line to `conf/log4j.properties`: `log4j.logger.org.apache.spark.deploy.yarn.ExecutorRunnable=INFO` Refer to Logging.

`prepareEnvironment` Method

Caution

FIXME

Creating `ExecutorRunnable` Instance

ExecutorRunnable(
  container: Container,
  conf: Configuration,
  sparkConf: SparkConf,
  masterAddress: String,
  slaveId: String,
  hostname: String,
  executorMemory: Int,
  executorCores: Int,
  appId: String,
  securityMgr: SecurityManager,
  localResources: Map[String, LocalResource])

YarnAllocator creates an instance of ExecutorRunnable when launching Spark executors in allocated YARN containers.

A single ExecutorRunnable is created with the YARN container to run a Spark executor in.

The input conf (Hadoop’s Configuration), sparkConf, masterAddress directly correspond to the constructor arguments of YarnAllocator.

The input slaveId is from the internal counter in YarnAllocator.

The input hostname is the host of the YARN container.

The input executorMemory and executorCores are from YarnAllocator, but come from spark.executor.memory and spark.executor.cores configuration settings.

The input appId, securityMgr, and localResources are the same as YarnAllocator was created for.

Running `ExecutorRunnable` — `run` Method

When called, you should see the following INFO message in the logs:

INFO ExecutorRunnable: Starting Executor Container

It creates a YARN NMClient, inits it with yarnConf and starts it.

It ultimately starts CoarseGrainedExecutorBackend in the container.

Starting `CoarseGrainedExecutorBackend` in Container — `startContainer` Method

startContainer(): java.util.Map[String, ByteBuffer]

startContainer uses the NMClient API to start a CoarseGrainedExecutorBackend in a YARN container.

When startContainer is executed, you should see the following INFO message in the logs:

INFO ExecutorRunnable: Setting up ContainerLaunchContext

It then creates a YARN ContainerLaunchContext (which represents all of the information for the YARN NodeManager to launch a container) with the local resources and environment being the localResources and env, respectively, passed in to the ExecutorRunnable when it was created. It also sets security tokens.

It prepares the command to launch CoarseGrainedExecutorBackend with all the details as provided when the ExecutorRunnable was created.

You should see the following INFO message in the logs:

INFO ExecutorRunnable:
===============================================================================
YARN executor launch context:
  env:
    [key] -> [value]
    ...

  command:
    [commands]
===============================================================================

The command is set to the just-created ContainerLaunchContext.

It sets application ACLs using YarnSparkHadoopUtil.getApplicationAclsForYarn.

If external shuffle service is used, it registers with the YARN shuffle service already started on the NodeManager. The external shuffle service is set in the ContainerLaunchContext context as a service data using spark_shuffle.

Ultimately, startContainer requests the YARN NodeManager to start the YARN container for a Spark executor (as passed in when the ExecutorRunnable was created) with the ContainerLaunchContext context.

If any exception happens, a SparkException is thrown.

Exception while starting container [containerId] on host [hostname]

Note	`startContainer` is exclusively called as a part of running `ExecutorRunnable`.

Preparing Command to Launch `CoarseGrainedExecutorBackend` — `prepareCommand` Method

prepareCommand(
  masterAddress: String,
  slaveId: String,
  hostname: String,
  executorMemory: Int,
  executorCores: Int,
  appId: String): List[String]

prepareCommand is a private method to prepare the command that is used to start org.apache.spark.executor.CoarseGrainedExecutorBackend application in a YARN container. All the input parameters of prepareCommand become the command-line arguments of CoarseGrainedExecutorBackend application.

The input executorMemory is in m and becomes -Xmx in the JVM options.

It uses the optional spark.executor.extraJavaOptions for the JVM options.

If the optional SPARK_JAVA_OPTS environment variable is defined, it is added to the JVM options.

It uses the optional spark.executor.extraLibraryPath to set prefixEnv. It uses Client.getClusterPath.

Caution

FIXME Client.getClusterPath?

It sets -Dspark.yarn.app.container.log.dir=<LOG_DIR> It sets the user classpath (using Client.getUserClasspath).

Caution

FIXME Client.getUserClasspath?

Finally, it creates the entire command to start org.apache.spark.executor.CoarseGrainedExecutorBackend with the following arguments:

--driver-url being the input masterAddress
--executor-id being the input slaveId
--hostname being the input hostname
--cores being the input executorCores
--app-id being the input appId

Internal Registries and Counters

Table 1. Internal Registries and Counters
Name	Description
`yarnConf`	An instance of YARN’s YarnConfiguration. Created when `ExecutorRunnable` is created.

ExecutorRunnable