INFO SparkContext: Registered listener org.apache.spark.scheduler.StatsReportListener
Spark Listeners — Intercepting Events from Spark Scheduler
Spark Listeners intercept events from the Spark scheduler that are emitted over the course of execution of Spark applications.
A Spark listener is an implementation of the SparkListener
developer API that is an extension of SparkListenerInterface where all the callback methods are no-op/do-nothing.
Spark uses Spark listeners for web UI, event persistence (for History Server), dynamic allocation of executors and other services.
You can develop your own custom Spark listeners using the SparkListener
developer API and register them using SparkContext.addSparkListener method or spark.extraListeners setting. With SparkListener
you can focus on Spark events of your liking and process a subset of scheduling events.
Tip
|
Developing a custom SparkListener is an excellent introduction to low-level details of Spark’s Execution Model. Check out the exercise Developing Custom SparkListener to monitor DAGScheduler in Scala. |
Tip
|
Enable |
SparkListenerInterface
SparkListenerInterface
is an internal interface for listeners of events from the Spark scheduler.
Method | Description |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Built-In Spark Listeners
Spark Listener | Description |
---|---|
Logs JSON-encoded events to a file that can later be read by History Server |
|
|
Allows users to receive all SparkListenerEvent events by overriding the single |
Prepares information for Executors tab in web UI |
|
StorageStatusListener, RDDOperationGraphListener, EnvironmentListener, BlockStatusListener and StorageListener |
For web UI |
|
|
|
|
Support for History Server |
|
SparkListenerEvents
Caution
|
FIXME Give a less code-centric description of the times for the events. |
SparkListenerExecutorMetricsUpdate
Caution
|
FIXME |
SparkListenerApplicationStart
SparkListenerApplicationStart(
appName: String,
appId: Option[String],
time: Long,
sparkUser: String,
appAttemptId: Option[String],
driverLogs: Option[Map[String, String]] = None)
SparkListenerApplicationStart
is posted when SparkContext
does postApplicationStart
.
SparkListenerJobStart
SparkListenerJobStart(
jobId: Int,
time: Long,
stageInfos: Seq[StageInfo],
properties: Properties = null)
SparkListenerJobStart
is posted when DAGScheduler
does handleJobSubmitted
and handleMapStageSubmitted
.
SparkListenerStageSubmitted
SparkListenerStageSubmitted(stageInfo: StageInfo, properties: Properties = null)
SparkListenerStageSubmitted
is posted when DAGScheduler
does submitMissingTasks
.
SparkListenerTaskStart
SparkListenerTaskStart(stageId: Int, stageAttemptId: Int, taskInfo: TaskInfo)
SparkListenerTaskStart
is posted when DAGScheduler
is informed that a task is being started.
SparkListenerTaskGettingResult
SparkListenerTaskGettingResult(taskInfo: TaskInfo)
SparkListenerTaskGettingResult
is posted when DAGScheduler
handles GettingResultEvent
event.
SparkListenerTaskEnd
SparkListenerTaskEnd(
stageId: Int,
stageAttemptId: Int,
taskType: String,
reason: TaskEndReason,
taskInfo: TaskInfo,
// may be null if the task has failed
@Nullable taskMetrics: TaskMetrics)
SparkListenerTaskEnd
is posted when DAGScheduler
handles a task completion.
SparkListenerStageCompleted
SparkListenerStageCompleted(stageInfo: StageInfo)
SparkListenerStageCompleted
is posted when DAGScheduler
does markStageAsFinished
.
SparkListenerJobEnd
SparkListenerJobEnd(
jobId: Int,
time: Long,
jobResult: JobResult)
SparkListenerJobEnd
is posted when DAGScheduler
does cleanUpAfterSchedulerStop
, handleTaskCompletion
, failJobAndIndependentStages
, and markMapStageJobAsFinished.
SparkListenerApplicationEnd
SparkListenerApplicationEnd(time: Long)
SparkListenerApplicationEnd
is posted when SparkContext
does postApplicationEnd
.
SparkListenerEnvironmentUpdate
SparkListenerEnvironmentUpdate(environmentDetails: Map[String, Seq[(String, String)]])
SparkListenerEnvironmentUpdate
is posted when SparkContext
does postEnvironmentUpdate
.
SparkListenerBlockManagerAdded
SparkListenerBlockManagerAdded(
time: Long,
blockManagerId: BlockManagerId,
maxMem: Long)
SparkListenerBlockManagerAdded
is posted when BlockManagerMasterEndpoint
registers a BlockManager
.
SparkListenerBlockManagerRemoved
SparkListenerBlockManagerRemoved(
time: Long,
blockManagerId: BlockManagerId)
SparkListenerBlockManagerRemoved
is posted when BlockManagerMasterEndpoint
removes a BlockManager
.
SparkListenerBlockUpdated
SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo)
SparkListenerBlockUpdated
is posted when BlockManagerMasterEndpoint
receives UpdateBlockInfo
message.
SparkListenerUnpersistRDD
SparkListenerUnpersistRDD(rddId: Int)
SparkListenerUnpersistRDD
is posted when SparkContext
does unpersistRDD
.
SparkListenerExecutorAdded
SparkListenerExecutorAdded(
time: Long,
executorId: String,
executorInfo: ExecutorInfo)
SparkListenerExecutorAdded
is posted when DriverEndpoint
RPC endpoint (of CoarseGrainedSchedulerBackend
) handles RegisterExecutor
message, MesosFineGrainedSchedulerBackend
does resourceOffers
, and LocalSchedulerBackendEndpoint
starts.
SparkListenerExecutorRemoved
SparkListenerExecutorRemoved(
time: Long,
executorId: String,
reason: String)
SparkListenerExecutorRemoved
is posted when DriverEndpoint
RPC endpoint (of CoarseGrainedSchedulerBackend
) does removeExecutor
and MesosFineGrainedSchedulerBackend
does removeExecutor
.