BlockDataManager - Block Storage Management API

BlockDataManager is a pluggable interface to manage storage for blocks (aka block storage management API). Blocks are identified by BlockId and stored as ManagedBuffer.

Note
BlockManager is currently the only available implementation of BlockDataManager.
Note
org.apache.spark.network.BlockDataManager is a private[spark] Scala trait in Spark.

BlockDataManager Contract

Every BlockDataManager offers the following services:

  • getBlockData to fetch a local block data by blockId.

    getBlockData(blockId: BlockId): ManagedBuffer
  • putBlockData to upload a block data locally by blockId. The return value says whether the operation has succeeded (true) or failed (false).

    putBlockData(
      blockId: BlockId,
      data: ManagedBuffer,
      level: StorageLevel,
      classTag: ClassTag[_]): Boolean
  • releaseLock is a release lock for getBlockData and putBlockData operations.

    releaseLock(blockId: BlockId): Unit

BlockId

BlockId identifies a block of data. It has a globally unique identifier (name)

There are the following types of BlockId:

  • RDDBlockId - described by rddId and splitIndex

  • ShuffleBlockId - described by shuffleId, mapId and reduceId

  • ShuffleDataBlockId - described by shuffleId, mapId and reduceId

  • ShuffleIndexBlockId - described by shuffleId, mapId and reduceId

  • BroadcastBlockId - described by broadcastId and optional field - a piece of broadcast value

  • TaskResultBlockId - described by taskId

  • StreamBlockId - described by streamId and uniqueId

BroadcastBlockId

BroadcastBlockId is an BlockId with a long identifier and an optional field.

ManagedBuffer

results matching ""

    No results matching ""