Driver

A Spark driver (aka an application’s driver process) is a JVM process that hosts SparkContext for a Spark application.

It is the cockpit of jobs and tasks execution (using DAGScheduler and Task Scheduler). It hosts Web UI for the environment.

spark driver.png
Figure 1. Driver with the services

It splits a Spark application into tasks and schedules them to run on executors.

A driver is where the task scheduler lives and spawns tasks across workers.

A driver coordinates workers and overall execution of tasks.

Note
Spark shell is a Spark application and the driver. It creates a SparkContext that is available as sc.

Driver requires the additional services (beside the common ones like ShuffleManager, MemoryManager, BlockTransferService, BroadcastManager, CacheManager):

Caution
FIXME Diagram of RpcEnv for a driver (and later executors). Perhaps it should be in the notes about RpcEnv?
  • High-level control flow of work

  • Your Spark application runs as long as the Spark driver.

    • Once the driver terminates, so does your Spark application.

  • Creates SparkContext, `RDD’s, and executes transformations and actions

  • Launches tasks

Driver’s Memory

It can be set first using spark-submit’s --driver-memory command-line option or spark.driver.memory and falls back to SPARK_DRIVER_MEMORY if not set earlier.

Note
It is printed out to the standard error output in spark-submit’s verbose mode.

Driver’s Cores

It can be set first using spark-submit’s --driver-cores command-line option for cluster deploy mode.

Note
In client deploy mode the driver’s memory corresponds to the memory of the JVM process the Spark application runs on.
Note
It is printed out to the standard error output in spark-submit’s verbose mode.

Settings / System Properties

spark.driver.extraClassPath

spark.driver.extraClassPath system property sets the additional classpath entries (e.g. jars and directories) that should be added to the driver’s classpath in cluster deploy mode.

Note

For client deploy mode you can use a properties file or command line to set spark.driver.extraClassPath.

Do not use SparkConf since it is too late for client deploy mode given the JVM has already been set up to start a Spark application.

Refer to buildSparkSubmitCommand Internal Method for the very low-level details of how it is handled internally.

spark.driver.extraClassPath uses a OS-specific path separator.

Note
Use spark-submit's --driver-class-path command-line option on command line to override spark.driver.extraClassPath from a Spark properties file.

spark.driver.libraryPath

spark.driver.libraryPath

spark.driver.extraLibraryPath

spark.driver.extraLibraryPath

spark.driver.extraJavaOptions

spark.driver.extraJavaOptions sets the additional JVM options for a driver.

spark.driver.appUIAddress

spark.driver.appUIAddress is only used in Spark on YARN. It is set when YarnClientSchedulerBackend starts to run ExecutorLauncher (and register ApplicationMaster for the Spark application).

spark.driver.cores

spark.driver.cores (default: 1) sets the number of CPU cores assigned for the driver in cluster deploy mode.

Note
When Client is created (for Spark on YARN in cluster mode only), it sets the number of cores for ApplicationManager using spark.driver.cores.

Read Driver’s Cores for a closer coverage.

spark.driver.memory

spark.driver.memory (default: 1g) sets the driver’s memory size (in MiBs).

Read Driver’s Memory for a closer coverage.

results matching ""

    No results matching ""