log4j.logger.org.apache.spark.scheduler.EventLoggingListener=INFO
Persisting Events using EventLoggingListener
EventLoggingListener
is a SparkListener that persists JSON-encoded events to a file.
When event logging is enabled, EventLoggingListener
writes events to a log file under spark.eventLog.dir directory. All Spark events are logged (except SparkListenerBlockUpdated and SparkListenerExecutorMetricsUpdate).
Tip
|
Use Spark History Server to view the event logs in a browser. |
Events can optionally be compressed.
In-flight log files are with .inprogress
extension.
EventLoggingListener
is a private[spark]
class in org.apache.spark.scheduler
package.
Tip
|
Enable Add the following line to Refer to Logging. |
Creating EventLoggingListener
Instance
EventLoggingListener
requires an application id (appId
), the application’s optional attempt id (appAttemptId
), logBaseDir
, a SparkConf (as sparkConf
) and Hadoop’s Configuration (as hadoopConf
).
Note
|
When initialized with no Hadoop’s Configuration it calls SparkHadoopUtil.get.newConfiguration(sparkConf).
|
Starting EventLoggingListener
— start
method
start(): Unit
start
checks whether logBaseDir
is really a directory, and if it is not, it throws a IllegalArgumentException
with the following message:
Log directory [logBaseDir] does not exist.
The log file’s working name is created based on appId
with or without the compression codec used and appAttemptId
, i.e. local-1461696754069
. It also uses .inprogress
extension.
If overwrite is enabled, you should see the WARN message:
WARN EventLoggingListener: Event log [path] already exists. Overwriting...
The working log .inprogress
is attempted to be deleted. In case it could not be deleted, the following WARN message is printed out to the logs:
WARN EventLoggingListener: Error deleting [path]
The buffered output stream is created with metadata with Spark’s version and SparkListenerLogStart
class' name as the first line.
{"Event":"SparkListenerLogStart","Spark Version":"2.0.0-SNAPSHOT"}
At this point, EventLoggingListener
is ready for event logging and you should see the following INFO message in the logs:
INFO EventLoggingListener: Logging events to [logPath]
Note
|
start is executed while SparkContext is created.
|
Logging Event as JSON — logEvent
method
logEvent(event: SparkListenerEvent, flushLogger: Boolean = false)
logEvent
logs event
as JSON using org.apache.spark.util.JsonProtocol
object.
Stopping EventLoggingListener
— stop
method
stop(): Unit
stop
closes PrintWriter
for the log file and renames the file to be without .inprogress
extension.
If the target log file exists (one without .inprogress
extension), it overwrites the file if spark.eventLog.overwrite is enabled. You should see the following WARN message in the logs:
WARN EventLoggingListener: Event log [target] already exists. Overwriting...
If the target log file exists and overwrite is disabled, an java.io.IOException
is thrown with the following message:
Target log file already exists ([logPath])
Note
|
stop is executed while SparkContext is stopped.
|
Compressing Logged Events
If event compression is enabled, CompressionCodec.createCodec(sparkConf)
is called to create a compression codec using a short codec name or the fully-qualified class name of a codec.
Tip
|
Read Compression to learn about the built-in compression codecs. |
Settings
Spark Property | Default Value | Description |
---|---|---|
|
Enables ( |
|
|
Directory where events are logged, e.g. The directory must exist before Spark starts up. |
|
|
Size of the buffer to use when writing to output streams. |
|
|
Enables ( |
|
|
Enables ( |
|
|
Internal flag for testing purposes that enables adding JSON events to the internal |