YarnAllocator — Container Allocator

YarnAllocator allocates resource containers from YARN ResourceManager to run Spark executors on and releases them when the Spark application no longer needs them.

It talks directly to YARN ResourceManager through the amClient reference (of YARN’s AMRMClient[ContainerRequest] type) that it gets when created (from YarnRMClient when it registers the ApplicationMaster for a Spark application).

Caution

FIXME Image for YarnAllocator uses amClient Reference to YARN ResourceManager

YarnAllocator is a part of the internal state of ApplicationMaster (via the internal allocator reference).

Figure 1. ApplicationMaster uses YarnAllocator (via allocator attribute)

When YarnAllocator is created, it requires driverUrl, Hadoop’s Configuration, a Spark configuration, YARN’s ApplicationAttemptId, a SecurityManager, and a collection of Hadoop’s LocalResources by their name. The parameters are later used for launching Spark executors in allocated YARN containers.

Caution

FIXME An image with YarnAllocator and multiple ExecutorRunnables.

Tip	Enable `INFO` or `DEBUG` logging level for `org.apache.spark.deploy.yarn.YarnAllocator` logger to see what happens inside. Add the following line to `conf/log4j.properties`: `log4j.logger.org.apache.spark.deploy.yarn.YarnAllocator=DEBUG` Refer to Logging.

Creating YarnAllocator Instance

When YarnRMClient registers ApplicationMaster for a Spark application (with YARN ResourceManager) it creates a new YarnAllocator instance.

spark yarn YarnAllocator amClient ResourceManager.png

Figure 2. Creating YarnAllocator

All the input parameters for YarnAllocator (but appAttemptId and amClient) are passed directly from the input parameters of YarnRMClient.

YarnAllocator(
  driverUrl: String,
  driverRef: RpcEndpointRef,
  conf: Configuration,
  sparkConf: SparkConf,
  amClient: AMRMClient[ContainerRequest],
  appAttemptId: ApplicationAttemptId,
  securityMgr: SecurityManager,
  localResources: Map[String, LocalResource])

The input amClient parameter is created in and owned by YarnRMClient.

When YarnAllocator is created, it sets the org.apache.hadoop.yarn.util.RackResolver logger to WARN (unless set to some log level already).

It creates the following empty registries:

releasedContainers
allocatedHostToContainersMap
allocatedContainerToHostMap
pendingLossReasonRequests
releasedExecutorLossReasons
executorIdToContainer
containerIdToExecutorId
hostToLocalTaskCounts

It sets the following internal counters:

numExecutorsRunning to 0
executorIdCounter to the last allocated executor id (it seems quite an extensive operation that uses a RPC system)
numUnexpectedContainerRelease to 0L
numLocalityAwareTasks to 0
targetNumExecutors to the initial number of executors

It creates an empty queue of failed executors.

It sets the internal executorFailuresValidityInterval to spark.yarn.executor.failuresValidityInterval.

It sets the internal executorMemory to spark.executor.memory.

It sets the internal memoryOverhead to spark.yarn.executor.memoryOverhead. If unavailable, it is set to the maximum of 10% of executorMemory and 384.

It sets the internal executorCores to spark.executor.cores.

It creates the internal resource to Hadoop YARN’s Resource with both executorMemory + memoryOverhead memory and executorCores CPU cores.

It creates the internal launcherPool called ContainerLauncher with maximum spark.yarn.containerLauncherMaxThreads threads.

It sets the internal launchContainers to spark.yarn.launchContainers.

It sets the internal labelExpression to spark.yarn.executor.nodeLabelExpression.

It sets the internal nodeLabelConstructor to…FIXME

Caution

FIXME nodeLabelConstructor?

It sets the internal containerPlacementStrategy to…FIXME

Caution

FIXME LocalityPreferredContainerPlacementStrategy?

Requesting Executors with Locality Preferences (requestTotalExecutorsWithPreferredLocalities method)

requestTotalExecutorsWithPreferredLocalities(
  requestedTotal: Int,
  localityAwareTasks: Int,
  hostToLocalTaskCount: Map[String, Int]): Boolean

requestTotalExecutorsWithPreferredLocalities returns true if the current desired total number of executors is different than the input requestedTotal.

Note	`requestTotalExecutorsWithPreferredLocalities` should instead have been called `shouldRequestTotalExecutorsWithPreferredLocalities` since it answers the question whether to request total executors or not.

requestTotalExecutorsWithPreferredLocalities sets the internal numLocalityAwareTasks and hostToLocalTaskCounts attributes to the input localityAwareTasks and hostToLocalTaskCount arguments, respectively.

If the input requestedTotal is different than the internal targetNumExecutors attribute you should see the following INFO message in the logs:

INFO YarnAllocator: Driver requested a total number of [requestedTotal] executor(s).

It sets the internal targetNumExecutors attribute to the input requestedTotal and returns true. Otherwise, it returns false.

Note	`requestTotalExecutorsWithPreferredLocalities` is executed in response to `RequestExecutors` message to `ApplicationMaster`.

numLocalityAwareTasks Internal Counter

numLocalityAwareTasks: Int = 0

It tracks the number of locality-aware tasks to be used as container placement hint when YarnAllocator is requested for executors given locality preferences.

It is used as an input to containerPlacementStrategy.localityOfRequestedContainers when YarnAllocator updates YARN container allocation requests.

Adding or Removing Executor Container Requests (updateResourceRequests method)

updateResourceRequests(): Unit

updateResourceRequests requests new or cancels outstanding executor containers from the YARN ResourceManager.

Note	In YARN, you have to request containers for resources first (using AMRMClient.addContainerRequest) before calling AMRMClient.allocate.

It gets the list of outstanding YARN’s ContainerRequests (using the constructor’s AMRMClient[ContainerRequest]) and aligns their number to current workload.

updateResourceRequests consists of two main branches:

missing executors, i.e. when the number of executors allocated already or pending does not match the needs and so there are missing executors.
executors to cancel, i.e. when the number of pending executor allocations is positive, but the number of all the executors is more than Spark needs.

Case 1. Missing Executors

You should see the following INFO message in the logs:

INFO YarnAllocator: Will request [count] executor containers, each with [vCores] cores and [memory] MB memory including [memoryOverhead] MB overhead

It then splits pending container allocation requests per locality preference of pending tasks (in the internal hostToLocalTaskCounts registry).

Caution

FIXME Review splitPendingAllocationsByLocality

It removes stale container allocation requests (using YARN’s AMRMClient.removeContainerRequest).

Caution

FIXME Stale?

You should see the following INFO message in the logs:

INFO YarnAllocator: Canceled [cancelledContainers] container requests (locality no longer needed)

It computes locality of requested containers (based on the internal numLocalityAwareTasks, hostToLocalTaskCounts and allocatedHostToContainersMap lookup table).

Caution

FIXME Review containerPlacementStrategy.localityOfRequestedContainers + the code that follows.

For any new container needed updateResourceRequests adds a container request (using YARN’s AMRMClient.addContainerRequest).

You should see the following INFO message in the logs:

INFO YarnAllocator: Submitted container request (host: [host], capability: [resource])

Case 2. Cancelling Pending Executor Allocations

When there are executors to cancel (case 2.), you should see the following INFO message in the logs:

INFO Canceling requests for [numToCancel] executor container(s) to have a new desired total [targetNumExecutors] executors.

It checks whether there are pending allocation requests and removes the excess (using YARN’s AMRMClient.removeContainerRequest). If there are no pending allocation requests, you should see the WARN message in the logs:

WARN Expected to find pending requests, but found none.

killExecutor

Caution

YarnAllocator — Container Allocator

Creating YarnAllocator Instance

Requesting Executors with Locality Preferences (requestTotalExecutorsWithPreferredLocalities method)

numLocalityAwareTasks Internal Counter

Adding or Removing Executor Container Requests (updateResourceRequests method)

Case 1. Missing Executors

Case 2. Cancelling Pending Executor Allocations

killExecutor

Handling Allocated Containers for Executors (handleAllocatedContainers internal method)

Launching Spark Executors in Allocated YARN Containers (runAllocatedContainers internal method)

updateInternalState

Releasing YARN Container (internalReleaseContainer internal procedure)

Deciding on Use of YARN Container (matchContainerToRequest internal method)

ContainerLauncher Thread Pool

processCompletedContainers

numUnexpectedContainerRelease Internal Counter

releasedExecutorLossReasons Internal Lookup Table

pendingLossReasonRequests Internal Lookup Table

executorIdToContainer Internal Translation Table

containerIdToExecutorId Internal Translation Table

allocatedHostToContainersMap Internal Lookup Table

numExecutorsRunning Internal Counter

allocatedContainerToHostMap Internal Lookup Table

Allocating YARN Containers for Executors and Cancelling Outstanding Containers (allocateResources method)

Internal Registries

hostToLocalTaskCounts

containerIdToExecutorId

executorIdToContainer

releasedExecutorLossReasons

pendingLossReasonRequests

failedExecutorsTimeStamps

releasedContainers Internal Registry

Desired Total Number of Executors (targetNumExecutors Internal Attribute)

results matching ""

No results matching ""