val plan = dataset.queryExecution.logical
LogicalPlan — Logical Query Plan
LogicalPlan is a QueryPlan that corresponds to a logical operator being a structured query in Spark SQL.
|
Note
|
LogicalPlan uses Catalyst Framework to represents itself as a tree of children LogicalPlans.
|
A LogicalPlan is what makes a Dataset that you can ask for through its QueryExecution.
A logical plan can be analyzed which is to say that the plan (including children) has gone through analysis and verification.
scala> plan.analyzed
res1: Boolean = true
A logical plan can also be resolved to a specific schema.
scala> plan.resolved
res2: Boolean = true
A logical plan knows the size of objects that are results of query operators, like join, through Statistics object.
scala> val stats = plan.statistics
stats: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(8,false)
A logical plan knows the maximum number of records it can compute.
scala> val maxRows = plan.maxRows
maxRows: Option[Long] = None
A logical plan can be streaming if it contains one or more structured streaming sources.
resolveQuoted method
|
Caution
|
FIXME |
Command — Logical Commands
Command is the base for leaf logical plans that represent non-query commands to be executed by the system. It defines output to return an empty collection of Attributes.
Known commands are:
-
CreateTable -
Any RunnableCommand
RunnableCommand — Logical Commands with Side Effects
RunnableCommand is the base trait for side-effecting logical commands that are executed for their side-effects.
RunnableCommand defines one abstract method run that computes a collection of Rows.
run(sparkSession: SparkSession): Seq[Row]
|
Note
|
RunnableCommand is translated to ExecutedCommandExec in BasicOperators strategy.
|
Is Logical Plan Structured Streaming — isStreaming method
isStreaming: Boolean
isStreaming is a part of the public API of LogicalPlan and is enabled (i.e. true) when a logical plan is a streaming source.
By default, it walks over subtrees and calls itself, i.e. isStreaming, on every child node to find a streaming source.
val spark: SparkSession = ...
// Regular dataset
scala> val ints = spark.createDataset(0 to 9)
ints: org.apache.spark.sql.Dataset[Int] = [value: int]
scala> ints.queryExecution.logical.isStreaming
res1: Boolean = false
// Streaming dataset
scala> val logs = spark.readStream.format("text").load("logs/*.out")
logs: org.apache.spark.sql.DataFrame = [value: string]
scala> logs.queryExecution.logical.isStreaming
res2: Boolean = true