Spark, Akka and Netty

Spark uses Akka Actor for RPC and messaging, which in turn uses Netty.

Also, for moving bulk data, Netty is used.

  • For shuffle data, Netty can be optionally used. By default, NIO is directly used to do transfer shuffle data.

  • For broadcast data (driver-to-all-worker data transfer), Jetty is used by default.

Tip
Review org.apache.spark.util.AkkaUtils to learn about the various utilities using Akka.
  • sparkMaster is the name of Actor System for the master in Spark Standalone, i.e. akka://sparkMaster is the Akka URL.

  • Akka configuration is for remote actors (via akka.actor.provider = "akka.remote.RemoteActorRefProvider")

  • Enable logging for Akka-related functions in org.apache.spark.util.Utils class at INFO level.

  • Enable logging for RPC messages as DEBUG for org.apache.spark.rpc.akka.AkkaRpcEnv

  • spark.akka.threads (default: 4)

  • spark.akka.batchSize (default: 15)

  • spark.akka.frameSize (default: 128 MB, maximum: 2047 MB) is the max frame size for Akka messages in bytes. If a task result is bigger, executors use block manager to send results back.

  • spark.akka.logLifecycleEvents (default: false)

  • spark.akka.logAkkaConfig (default: true)

  • spark.akka.heartbeat.pauses (default: 6000s)

  • spark.akka.heartbeat.interval (default: 1000s)

  • Configs starting with akka. in properties file are supported.

results matching ""

    No results matching ""