import org.apache.spark.sql.expressions.Window
val orderId = Window.orderBy('id)
val dataset = spark.range(5).withColumn("group", 'id % 3)
scala> dataset.select('*, rank over orderId as "rank").show
+---+-----+----+
| id|group|rank|
+---+-----+----+
| 0| 0| 1|
| 1| 1| 2|
| 2| 2| 3|
| 3| 0| 4|
| 4| 1| 5|
+---+-----+----+
WindowExec Physical Operator
WindowExec
is a unary physical plan with a collection of NamedExpressions
(for windows), a collection of Expressions
(for partitions), a collection of SortOrder
(for sorting) and a child
physical plan.
The output
of WindowExec
are the output
of child
physical plan and windows.
When executed (i.e. show
) with no partitions, WindowExec
prints out the following WARN message to the logs:
WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
Tip
|
Enable Add the following line to
Refer to Logging. |
Caution
|
FIXME Describe ClusteredDistribution
|
When the number of rows exceeds 4096
, WindowExec
creates UnsafeExternalSorter
.
Caution
|
FIXME What’s UnsafeExternalSorter ?
|