executeQuery[T](query: => T): T
SparkPlan — Physical Execution Plan
SparkPlan is the base QueryPlan for physical operators in a structured query.
SparkPlan contract assumes that concrete implementations define doExecute which is executed when the concrete execute is called.
When executed, a SparkPlan produces RDDs of InternalRow (i.e. RDD[InternalRow]s).
|
Caution
|
FIXME SparkPlan is Serializable. Why?
|
|
Note
|
The naming convention for physical operators in Spark’s source code is to have their names end with the Exec prefix, e.g. DebugExec or LocalTableScanExec.
|
|
Tip
|
Read InternalRow about the internal binary row format. |
SparkPlan has the following attributes:
-
metadata -
outputPartitioning -
outputOrdering
SparkPlan has the following final methods that prepare environment and pass calls on to corresponding methods that constitute SparkPlan Contract:
-
executecallsdoExecute -
preparecallsdoPrepare -
executeBroadcastcallsdoExecuteBroadcast
Executing Query in Scope (after Preparations) — executeQuery Final Method
executeQuery executes query in a scope (i.e. so that all RDDs created will have the same scope).
Internally, executeQuery calls prepare and waitForSubqueries before executing query.
|
Note
|
executeQuery is executed as part of execute, executeBroadcast and when CodegenSupport produces a Java source code.
|
Computing Query Result As Broadcast Variable — executeBroadcast Final Method
executeBroadcast[T](): broadcast.Broadcast[T]
executeBroadcast returns the results of the query as a broadcast variable.
Internally, executeBroadcast executes doExecuteBroadcast inside executeQuery.
|
Note
|
executeBroadcast is executed in BroadcastHashJoinExec, BroadcastNestedLoopJoinExec and ReusedExchangeExec.
|
SparkPlan Contract
The contract of SparkPlan requires that concrete implementations define the following method:
-
doExecute(): RDD[InternalRow]
They may also define their own custom overrides:
-
doPrepare -
doExecuteBroadcast
|
Caution
|
FIXME Why are there two executes? |
executeCollect
|
Caution
|
FIXME |
SQLMetric
SQLMetric is an accumulator that accumulate and produce long values.
There are three known SQLMetrics:
-
sum -
size -
timing
metrics Lookup Table
metrics: Map[String, SQLMetric] = Map.empty
metrics is a private[sql] lookup table of supported SQLMetrics by their names.