Debugging Query Execution
package object contains tools for debugging query execution that you can use to do the full analysis of your structured queries (i.e. Datasets
Let’s make it clear — they are methods, my dear. |
The methods are in org.apache.spark.sql.execution.debug
package and work on your Datasets
and SparkSession.
FIXME Expand on the SparkSession part.
Import the package and do the full analysis using debug
import org.apache.spark.sql.execution.debug._
scala> spark.range(10).where('id === 4).debug
Results returned: 1
== WholeStageCodegen ==
Tuples output: 1
id LongType: {java.lang.Long}
== Filter (id#25L = 4) ==
Tuples output: 0
id LongType: {}
== Range (0, 10, splits=8) ==
Tuples output: 0
id LongType: {}
You can also perform debugCodegen
import org.apache.spark.sql.execution.debug._
scala> spark.range(10).where('id === 4).debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#29L = 4)
+- *Range (0, 10, splits=8)
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Project [(id#33L + 6) AS (((id + 1) + 2) + 3)#36L, (id#33L + 15) AS (((id + 4) + 5) + 6)#37L]
+- *Range (1, 1000, splits=8)
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;