Broadcast Manager
Broadcast Manager is a Spark service to manage broadcast values in Spark jobs. It is created for a Spark application as part of SparkContext’s initialization and is a simple wrapper around BroadcastFactory.
Broadcast Manager tracks the number of broadcast values (using the internal field nextBroadcastId
).
The idea is to transfer values used in transformations from a driver to executors in a most effective way so they are copied once and used many times by tasks (rather than being copied every time a task is launched).
When BroadcastManager
is initialized an instance of BroadcastFactory
is created based on spark.broadcast.factory setting.
BroadcastFactory
BroadcastFactory
is a pluggable interface for broadcast implementations in Spark. It is exclusively used and instantiated inside of BroadcastManager
to manage broadcast variables.
It comes with 4 methods:
-
def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager): Unit
-
def newBroadcast[T: ClassTag](value: T, isLocal: Boolean, id: Long): Broadcast[T]
- called afterSparkContext.broadcast()
has been called. -
def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit
-
def stop(): Unit
Compression
With spark.broadcast.compress enabled (which is the default), TorrentBroadcast does compression.
Caution
|
FIXME What’s compressed? |
Alias | Fully-Qualified Class Name | Notes |
---|---|---|
|
|
|
|
|
The default implementation |
|
|
The fallback when the default codec is not available. |
An implementation of CompressionCodec
trait has to offer a constructor that accepts SparkConf.
Internally, TorrentBroadcast
sets the internal compressionCodec
value that is later used to create blocks for an object (in TorrentBroadcast.blockifyObject
) and to create the object out of the blocks (in TorrentBroadcast.unBlockifyObject
).
Caution
|
FIXME Review TorrentBroadcast.blockifyObject and TorrentBroadcast.unBlockifyObject .
|
Settings
Name | Default value | Description |
---|---|---|
|
The compression codec to use. See Compression. |
|
|
|
The fully-qualified class name for the implementation of |
|
|
The flag to enable compression. See Compression in this document. The setting is also used in SerializerManager. |
|
|
The size of a block |