scala> df.rdd.getNumPartitions
res6: Int = 8
scala> df.coalesce(1).rdd.getNumPartitions
res7: Int = 1
scala> df.coalesce(1).explain(extended = true)
== Parsed Logical Plan ==
Repartition 1, false
+- LocalRelation [value#1]
== Analyzed Logical Plan ==
value: int
Repartition 1, false
+- LocalRelation [value#1]
== Optimized Logical Plan ==
Repartition 1, false
+- LocalRelation [value#1]
== Physical Plan ==
Coalesce 1
+- LocalTableScan [value#1]
CoalesceExec Physical Operator
CoalesceExec
is a unary physical plan with numPartitions
number of partitions and a child
spark plan. CoalesceExec
represents Repartition
logical plan at execution. When executed, it executes the input child
and calls coalesce on the result RDD (with shuffle
disabled).
Please note that since physical operators present themselves without the suffix Exec, CoalesceExec
is the Coalesce
in the Physical Plan section in the following example:
output
collection of Attribute matches the child
's (since CoalesceExec
is about changing the number of partitions not the internal representation).
outputPartitioning
returns a SinglePartition
when the input numPartitions
is 1
while a UnknownPartitioning
for the other cases.