Spark, Akka and Netty
Spark uses Akka Actor for RPC and messaging, which in turn uses Netty.
Also, for moving bulk data, Netty is used.
For shuffle data, Netty can be optionally used. By default, NIO is directly used to do transfer shuffle data.
For broadcast data (driver-to-all-worker data transfer), Jetty is used by default.
Tip
|
Review org.apache.spark.util.AkkaUtils to learn about the various utilities using Akka.
|
-
sparkMaster
is the name of Actor System for the master in Spark Standalone, i.e.akka://sparkMaster
is the Akka URL. -
Akka configuration is for remote actors (via
akka.actor.provider = "akka.remote.RemoteActorRefProvider"
) -
Enable logging for Akka-related functions in
org.apache.spark.util.Utils
class atINFO
level. -
Enable logging for RPC messages as
DEBUG
fororg.apache.spark.rpc.akka.AkkaRpcEnv
-
spark.akka.threads
(default:4
) -
spark.akka.batchSize
(default:15
) -
spark.akka.frameSize
(default:128
MB, maximum:2047
MB) is the max frame size for Akka messages in bytes. If a task result is bigger, executors use block manager to send results back. -
spark.akka.logLifecycleEvents
(default:false
) -
spark.akka.logAkkaConfig
(default:true
) -
spark.akka.heartbeat.pauses
(default:6000s
) -
spark.akka.heartbeat.interval
(default:1000s
) -
Configs starting with
akka.
in properties file are supported.