SparkPlan — Physical Execution Plan

SparkPlan is the base QueryPlan for physical operators in a structured query.

SparkPlan contract assumes that concrete implementations define doExecute which is executed when the concrete execute is called.

When executed, a SparkPlan produces RDDs of InternalRow (i.e. RDD[InternalRow]s).

Caution
FIXME SparkPlan is Serializable. Why?
Note
The naming convention for physical operators in Spark’s source code is to have their names end with the Exec prefix, e.g. DebugExec or LocalTableScanExec.
Tip
Read InternalRow about the internal binary row format.

SparkPlan has the following attributes:

  • metadata

  • metrics

  • outputPartitioning

  • outputOrdering

SparkPlan has the following final methods that prepare environment and pass calls on to corresponding methods that constitute SparkPlan Contract:

  • execute calls doExecute

  • prepare calls doPrepare

  • executeBroadcast calls doExecuteBroadcast

Executing Query in Scope (after Preparations) — executeQuery Final Method

executeQuery[T](query: => T): T

executeQuery executes query in a scope (i.e. so that all RDDs created will have the same scope).

Internally, executeQuery calls prepare and waitForSubqueries before executing query.

Note
executeQuery is executed as part of execute, executeBroadcast and when CodegenSupport produces a Java source code.

Computing Query Result As Broadcast Variable — executeBroadcast Final Method

executeBroadcast[T](): broadcast.Broadcast[T]

executeBroadcast returns the results of the query as a broadcast variable.

Internally, executeBroadcast executes doExecuteBroadcast inside executeQuery.

Note
executeBroadcast is executed in BroadcastHashJoinExec, BroadcastNestedLoopJoinExec and ReusedExchangeExec.

waitForSubqueries Method

prepare Method

SparkPlan Contract

The contract of SparkPlan requires that concrete implementations define the following method:

  • doExecute(): RDD[InternalRow]

They may also define their own custom overrides:

  • doPrepare

  • doExecuteBroadcast

Caution
FIXME Why are there two executes?

Specialized SparkPlans

  • UnaryExecNode

Caution
FIXME

executeCollect

Caution
FIXME

SQLMetric

SQLMetric is an accumulator that accumulate and produce long values.

There are three known SQLMetrics:

  • sum

  • size

  • timing

metrics Lookup Table

metrics: Map[String, SQLMetric] = Map.empty

metrics is a private[sql] lookup table of supported SQLMetrics by their names.

results matching ""

    No results matching ""