LogicalPlan — Logical Query Plan

LogicalPlan is a QueryPlan that corresponds to a logical operator being a structured query in Spark SQL.

Note	`LogicalPlan` uses Catalyst Framework to represents itself as a tree of children `LogicalPlan`s.

A LogicalPlan is what makes a Dataset that you can ask for through its QueryExecution.

val plan = dataset.queryExecution.logical

A logical plan can be analyzed which is to say that the plan (including children) has gone through analysis and verification.

scala> plan.analyzed
res1: Boolean = true

A logical plan can also be resolved to a specific schema.

scala> plan.resolved
res2: Boolean = true

A logical plan knows the size of objects that are results of query operators, like join, through Statistics object.

scala> val stats = plan.statistics
stats: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(8,false)

A logical plan knows the maximum number of records it can compute.

scala> val maxRows = plan.maxRows
maxRows: Option[Long] = None

A logical plan can be streaming if it contains one or more structured streaming sources.

resolveQuoted method

Caution

FIXME

Specialized LogicalPlans

LeafNode
UnaryNode
BinaryNode

`Command` — Logical Commands

Command is the base for leaf logical plans that represent non-query commands to be executed by the system. It defines output to return an empty collection of Attributes.

Known commands are:

CreateTable
Any RunnableCommand

`RunnableCommand` — Logical Commands with Side Effects

RunnableCommand is the base trait for side-effecting logical commands that are executed for their side-effects.

RunnableCommand defines one abstract method run that computes a collection of Rows.

run(sparkSession: SparkSession): Seq[Row]

Note	`RunnableCommand` is translated to ExecutedCommandExec in `BasicOperators` strategy.

Is Logical Plan Structured Streaming — `isStreaming` method

isStreaming: Boolean

isStreaming is a part of the public API of LogicalPlan and is enabled (i.e. true) when a logical plan is a streaming source.

By default, it walks over subtrees and calls itself, i.e. isStreaming, on every child node to find a streaming source.

val spark: SparkSession = ...

// Regular dataset
scala> val ints = spark.createDataset(0 to 9)
ints: org.apache.spark.sql.Dataset[Int] = [value: int]

scala> ints.queryExecution.logical.isStreaming
res1: Boolean = false

// Streaming dataset
scala> val logs = spark.readStream.format("text").load("logs/*.out")
logs: org.apache.spark.sql.DataFrame = [value: string]

scala> logs.queryExecution.logical.isStreaming
res2: Boolean = true

LogicalPlan — Logical Query Plan