Expression TreeNode

An Expression in Spark SQL 2.0’s Catalyst represents a query expression that can be SQL or non-SQL. It is described by the Expression contract.

Table 1. Catalyst’s Top-Level Expressions
Name Scala Kind Behaviour Example

Unevaluable

trait

Cannot be evaluated, i.e. eval and doGenCode throw an UnsupportedOperationException

CurrentDatabase

Nondeterministic

trait

NonSQLExpression

trait

ExpectsInputTypes

trait

CodegenFallback

trait

NamedExpression

trait

UnaryExpression

abstract class

BinaryExpression

abstract class

TernaryExpression

abstract class

LeafExpression

abstract class

Expression Contract

The contract of an expression in Spark SQL is described by the Expression abstract class.

abstract class Expression extends TreeNode[Expression]

A Expression is a TreeNode that obeys the following contract:

  1. It may or may not be foldable.

  2. It may or may not be deterministic.

  3. It may or may not be nullable.

  4. It uses references.

  5. It can be evaluated to a JVM object given a InternalRow.

  6. It can genCode to produce a ExprCode.

  7. It can doGenCode to produce a ExprCode.

  8. It may or may not be resolved.

  9. It is of dataType data type.

  10. It may or may not have childrenResolved.

  11. It has a canonicalized representation.

  12. It may or may not be semanticEquals given another Expression.

  13. It has a semanticHash.

  14. It can be checkInputDataTypes.

  15. It has a prettyName.

  16. It has a sql representation.

Unevaluable Expression

Unevaluable expressions cannot be evaluated, i.e. eval and doGenCode methods throw an UnsupportedOperationException.

Note
Unevaluable is a Scala trait.

Nondeterministic Expression

Nondeterministic expressions are non-deterministic and non-foldable, i.e. deterministic and foldable properties are disabled (i.e. false). They require explicit initialization before evaluation.

Nondeterministic expressions have two additional methods:

  1. initInternal for internal initialization (called before eval)

  2. evalInternal to evaluate a InternalRow into a JVM object.

Note
Nondeterministic is a Scala trait.

Nondeterministic expressions have the additional initialized flag that is enabled (i.e. true) after the other additional initInternal method has been called.

Examples of Nondeterministic expressions are InputFileName, MonotonicallyIncreasingID, SparkPartitionID functions and the abstract RDG (that is the base for Rand and Randn functions).

Note
Nondeterministic expressions are the target of PullOutNondeterministic logical plan rule.

NonSQLExpression Expression

NonSQLExpression expressions are expressions that have no SQL representation.

Internally, NonSQLExpression makes for all Attributes in a tree to be PrettyAttributes.

Note
NonSQLExpression is a Scala trait with the sql method being final (i.e. non-overridable).

ExpectsInputTypes Expression

ExpectsInputTypes expressions are…​TK

Note
ExpectsInputTypes is a Scala trait.

CodegenFallback Expression

CodegenFallback expressions are…​TK

Note
CodegenFallback is a Scala trait.

UnaryExpression Expression

UnaryExpression expressions are…​TK

Note
UnaryExpression is an abstract class.

BinaryExpression Expression

BinaryExpression expressions are…​TK

Note
BinaryExpression is an abstract class.

TernaryExpression Expression

TernaryExpression expressions are…​TK

Note
TernaryExpression is an abstract class.

LeafExpression Expression

LeafExpression expressions are Catalyst expressions with no children, i.e. children method returns always an empty collection.

Note
LeafExpression is an abstract class.

NamedExpression Expression

NamedExpression expressions are Catalyst expressions that can later be referenced in the dataflow graph.

Note
NamedExpression is a Scala trait.

results matching ""

    No results matching ""