Debugging Query Execution

debug package object contains tools for debugging query execution that you can use to do the full analysis of your structured queries (i.e. Datasets).

Note	Let’s make it clear — they are methods, my dear.

The methods are in org.apache.spark.sql.execution.debug package and work on your Datasets and SparkSession.

Caution

FIXME Expand on the SparkSession part.

debug()
debugCodegen()

Import the package and do the full analysis using debug method.

import org.apache.spark.sql.execution.debug._

scala> spark.range(10).where('id === 4).debug
Results returned: 1
== WholeStageCodegen ==
Tuples output: 1
 id LongType: {java.lang.Long}
== Filter (id#25L = 4) ==
Tuples output: 0
 id LongType: {}
== Range (0, 10, splits=8) ==
Tuples output: 0
 id LongType: {}

You can also perform debugCodegen.

import org.apache.spark.sql.execution.debug._

scala> spark.range(10).where('id === 4).debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#29L = 4)
+- *Range (0, 10, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
...

scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Project [(id#33L + 6) AS (((id + 1) + 2) + 3)#36L, (id#33L + 15) AS (((id + 4) + 5) + 6)#37L]
+- *Range (1, 1000, splits=8)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
...

Debugging Query Execution

Debugging Query Execution

results matching ""

No results matching ""