Spark, Akka and Netty
Spark uses Akka Actor for RPC and messaging, which in turn uses Netty.
Also, for moving bulk data, Netty is used.
For shuffle data, Netty can be optionally used. By default, NIO is directly used to do transfer shuffle data.
For broadcast data (driver-to-all-worker data transfer), Jetty is used by default.
|
Tip
|
Review org.apache.spark.util.AkkaUtils to learn about the various utilities using Akka.
|
-
sparkMasteris the name of Actor System for the master in Spark Standalone, i.e.akka://sparkMasteris the Akka URL. -
Akka configuration is for remote actors (via
akka.actor.provider = "akka.remote.RemoteActorRefProvider") -
Enable logging for Akka-related functions in
org.apache.spark.util.Utilsclass atINFOlevel. -
Enable logging for RPC messages as
DEBUGfororg.apache.spark.rpc.akka.AkkaRpcEnv -
spark.akka.threads(default:4) -
spark.akka.batchSize(default:15) -
spark.akka.frameSize(default:128MB, maximum:2047MB) is the max frame size for Akka messages in bytes. If a task result is bigger, executors use block manager to send results back. -
spark.akka.logLifecycleEvents(default:false) -
spark.akka.logAkkaConfig(default:true) -
spark.akka.heartbeat.pauses(default:6000s) -
spark.akka.heartbeat.interval(default:1000s) -
Configs starting with
akka.in properties file are supported.