Debugging Query Execution

debug package object contains tools for debugging query execution that you can use to do the full analysis of your structured queries (i.e. Datasets).

Note
Let’s make it clear — they are methods, my dear.

The methods are in org.apache.spark.sql.execution.debug package and work on your Datasets and SparkSession.

Caution
FIXME Expand on the SparkSession part.
debug()
debugCodegen()

Import the package and do the full analysis using debug method.

import org.apache.spark.sql.execution.debug._

scala> spark.range(10).where('id === 4).debug
Results returned: 1
== WholeStageCodegen ==
Tuples output: 1
 id LongType: {java.lang.Long}
== Filter (id#25L = 4) ==
Tuples output: 0
 id LongType: {}
== Range (0, 10, splits=8) ==
Tuples output: 0
 id LongType: {}

You can also perform debugCodegen.

import org.apache.spark.sql.execution.debug._

scala> spark.range(10).where('id === 4).debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#29L = 4)
+- *Range (0, 10, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
...
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Project [(id#33L + 6) AS (((id + 1) + 2) + 3)#36L, (id#33L + 15) AS (((id + 4) + 5) + 6)#37L]
+- *Range (1, 1000, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
...

results matching ""

    No results matching ""