Broadcast Manager

Broadcast Manager is a Spark service to manage broadcast values in Spark jobs. It is created for a Spark application as part of SparkContext’s initialization and is a simple wrapper around BroadcastFactory.

Broadcast Manager tracks the number of broadcast values (using the internal field nextBroadcastId).

The idea is to transfer values used in transformations from a driver to executors in a most effective way so they are copied once and used many times by tasks (rather than being copied every time a task is launched).

When BroadcastManager is initialized an instance of BroadcastFactory is created based on spark.broadcast.factory setting.

BroadcastFactory

BroadcastFactory is a pluggable interface for broadcast implementations in Spark. It is exclusively used and instantiated inside of BroadcastManager to manage broadcast variables.

It comes with 4 methods:

  • def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager): Unit

  • def newBroadcast[T: ClassTag](value: T, isLocal: Boolean, id: Long): Broadcast[T] - called after SparkContext.broadcast() has been called.

  • def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean): Unit

  • def stop(): Unit

Compression

With spark.broadcast.compress enabled (which is the default), TorrentBroadcast does compression.

Caution
FIXME What’s compressed?
Table 1. Built-in Compression Codecs
Alias Fully-Qualified Class Name Notes

lzf

org.apache.spark.io.LZFCompressionCodec

lz4

org.apache.spark.io.LZ4CompressionCodec

The default implementation

snappy

org.apache.spark.io.SnappyCompressionCodec

The fallback when the default codec is not available.

An implementation of CompressionCodec trait has to offer a constructor that accepts SparkConf.

Internally, TorrentBroadcast sets the internal compressionCodec value that is later used to create blocks for an object (in TorrentBroadcast.blockifyObject) and to create the object out of the blocks (in TorrentBroadcast.unBlockifyObject).

Caution
FIXME Review TorrentBroadcast.blockifyObject and TorrentBroadcast.unBlockifyObject.

Settings

Table 2. Settings
Name Default value Description

spark.io.compression.codec

lz4

The compression codec to use. See Compression.

spark.broadcast.factory

org.apache.spark.broadcast.TorrentBroadcastFactory

The fully-qualified class name for the implementation of BroadcastFactory interface.

spark.broadcast.compress

true

The flag to enable compression. See Compression in this document.

The setting is also used in SerializerManager.

spark.broadcast.blockSize

4m

The size of a block

results matching ""

    No results matching ""