executeQuery[T](query: => T): T
SparkPlan — Physical Execution Plan
SparkPlan
is the base QueryPlan for physical operators in a structured query.
SparkPlan contract assumes that concrete implementations define doExecute
which is executed when the concrete execute
is called.
When executed, a SparkPlan
produces RDDs of InternalRow (i.e. RDD[InternalRow]
s).
Caution
|
FIXME SparkPlan is Serializable . Why?
|
Note
|
The naming convention for physical operators in Spark’s source code is to have their names end with the Exec prefix, e.g. DebugExec or LocalTableScanExec.
|
Tip
|
Read InternalRow about the internal binary row format. |
SparkPlan
has the following attributes:
-
metadata
-
outputPartitioning
-
outputOrdering
SparkPlan
has the following final
methods that prepare environment and pass calls on to corresponding methods that constitute SparkPlan Contract:
-
execute
callsdoExecute
-
prepare
callsdoPrepare
-
executeBroadcast
callsdoExecuteBroadcast
Executing Query in Scope (after Preparations) — executeQuery
Final Method
executeQuery
executes query
in a scope (i.e. so that all RDDs created will have the same scope).
Internally, executeQuery
calls prepare and waitForSubqueries before executing query
.
Note
|
executeQuery is executed as part of execute, executeBroadcast and when CodegenSupport produces a Java source code.
|
Computing Query Result As Broadcast Variable — executeBroadcast
Final Method
executeBroadcast[T](): broadcast.Broadcast[T]
executeBroadcast
returns the results of the query as a broadcast variable.
Internally, executeBroadcast
executes doExecuteBroadcast inside executeQuery.
Note
|
executeBroadcast is executed in BroadcastHashJoinExec , BroadcastNestedLoopJoinExec and ReusedExchangeExec .
|
SparkPlan Contract
The contract of SparkPlan
requires that concrete implementations define the following method:
-
doExecute(): RDD[InternalRow]
They may also define their own custom overrides:
-
doPrepare
-
doExecuteBroadcast
Caution
|
FIXME Why are there two executes? |
executeCollect
Caution
|
FIXME |
SQLMetric
SQLMetric
is an accumulator that accumulate and produce long values.
There are three known SQLMetrics
:
-
sum
-
size
-
timing
metrics Lookup Table
metrics: Map[String, SQLMetric] = Map.empty
metrics
is a private[sql]
lookup table of supported SQLMetrics by their names.