debug()
debugCodegen()
Debugging Query Execution
debug
package object contains tools for debugging query execution that you can use to do the full analysis of your structured queries (i.e. Datasets
).
Note
|
Let’s make it clear — they are methods, my dear. |
The methods are in org.apache.spark.sql.execution.debug
package and work on your Datasets
and SparkSession.
Caution
|
FIXME Expand on the SparkSession part.
|
Import the package and do the full analysis using debug
method.
import org.apache.spark.sql.execution.debug._
scala> spark.range(10).where('id === 4).debug
Results returned: 1
== WholeStageCodegen ==
Tuples output: 1
id LongType: {java.lang.Long}
== Filter (id#25L = 4) ==
Tuples output: 0
id LongType: {}
== Range (0, 10, splits=8) ==
Tuples output: 0
id LongType: {}
You can also perform debugCodegen
.
import org.apache.spark.sql.execution.debug._
scala> spark.range(10).where('id === 4).debugCodegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Filter (id#29L = 4)
+- *Range (0, 10, splits=8)
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
...
scala> spark.range(1, 1000).select('id+1+2+3, 'id+4+5+6).queryExecution.debug.codegen
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Project [(id#33L + 6) AS (((id + 1) + 2) + 3)#36L, (id#33L + 15) AS (((id + 4) + 5) + 6)#37L]
+- *Range (1, 1000, splits=8)
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
...