Broadcast Manager
Broadcast Manager is a Spark service to manage broadcast values in Spark jobs. It is created for a Spark application as part of SparkContext’s initialization and is a simple wrapper around BroadcastFactory.
Broadcast Manager tracks the number of broadcast values (using the internal field nextBroadcastId).
The idea is to transfer values used in transformations from a driver to executors in a most effective way so they are copied once and used many times by tasks (rather than being copied every time a task is launched).
When BroadcastManager is initialized an instance of BroadcastFactory is created based on spark.broadcast.factory setting.
BroadcastFactory
BroadcastFactory is a pluggable interface for broadcast implementations in Spark. It is exclusively used and instantiated inside of BroadcastManager to manage broadcast variables.
It comes with 4 methods:
-
def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager): Unit -
def newBroadcast[T: ClassTag](value: T, isLocal: Boolean, id: Long): Broadcast[T]- called afterSparkContext.broadcast()has been called. -
def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit -
def stop(): Unit
Compression
With spark.broadcast.compress enabled (which is the default), TorrentBroadcast does compression.
|
Caution
|
FIXME What’s compressed? |
| Alias | Fully-Qualified Class Name | Notes |
|---|---|---|
|
|
|
|
|
The default implementation |
|
|
The fallback when the default codec is not available. |
An implementation of CompressionCodec trait has to offer a constructor that accepts SparkConf.
Internally, TorrentBroadcast sets the internal compressionCodec value that is later used to create blocks for an object (in TorrentBroadcast.blockifyObject) and to create the object out of the blocks (in TorrentBroadcast.unBlockifyObject).
|
Caution
|
FIXME Review TorrentBroadcast.blockifyObject and TorrentBroadcast.unBlockifyObject.
|
Settings
| Name | Default value | Description |
|---|---|---|
|
The compression codec to use. See Compression. |
|
|
|
The fully-qualified class name for the implementation of |
|
|
The flag to enable compression. See Compression in this document. The setting is also used in SerializerManager. |
|
|
The size of a block |