val plan = dataset.queryExecution.logical
LogicalPlan — Logical Query Plan
LogicalPlan
is a QueryPlan that corresponds to a logical operator being a structured query in Spark SQL.
Note
|
LogicalPlan uses Catalyst Framework to represents itself as a tree of children LogicalPlan s.
|
A LogicalPlan
is what makes a Dataset that you can ask for through its QueryExecution.
A logical plan can be analyzed which is to say that the plan (including children) has gone through analysis and verification.
scala> plan.analyzed
res1: Boolean = true
A logical plan can also be resolved to a specific schema.
scala> plan.resolved
res2: Boolean = true
A logical plan knows the size of objects that are results of query operators, like join
, through Statistics
object.
scala> val stats = plan.statistics
stats: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(8,false)
A logical plan knows the maximum number of records it can compute.
scala> val maxRows = plan.maxRows
maxRows: Option[Long] = None
A logical plan can be streaming if it contains one or more structured streaming sources.
resolveQuoted method
Caution
|
FIXME |
Command
— Logical Commands
Command
is the base for leaf logical plans that represent non-query commands to be executed by the system. It defines output
to return an empty collection of Attributes.
Known commands are:
-
CreateTable
-
Any RunnableCommand
RunnableCommand
— Logical Commands with Side Effects
RunnableCommand
is the base trait
for side-effecting logical commands that are executed for their side-effects.
RunnableCommand
defines one abstract method run
that computes a collection of Rows.
run(sparkSession: SparkSession): Seq[Row]
Note
|
RunnableCommand is translated to ExecutedCommandExec in BasicOperators strategy.
|
Is Logical Plan Structured Streaming — isStreaming
method
isStreaming: Boolean
isStreaming
is a part of the public API of LogicalPlan
and is enabled (i.e. true
) when a logical plan is a streaming source.
By default, it walks over subtrees and calls itself, i.e. isStreaming
, on every child node to find a streaming source.
val spark: SparkSession = ...
// Regular dataset
scala> val ints = spark.createDataset(0 to 9)
ints: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> ints.queryExecution.logical.isStreaming
res1: Boolean = false
// Streaming dataset
scala> val logs = spark.readStream.format("text").load("logs/*.out")
logs: org.apache.spark.sql.DataFrame = [value: string]
scala> logs.queryExecution.logical.isStreaming
res2: Boolean = true