import org.apache.spark.sql.types._
val schema = StructType(
StructField("id", LongType, nullable = false) ::
StructField("name", StringType, nullable = false) :: Nil)
import org.apache.spark.sql.catalyst.encoders.RowEncoder
scala> val encoder = RowEncoder(schema)
encoder: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row] = class[id[0]: bigint, name[0]: string]
// RowEncoder is never flat
scala> encoder.flat
res0: Boolean = false
RowEncoder
— DataFrame Encoder
RowEncoder
is a part of the Encoder framework and acts as the encoder for DataFrames, i.e. Dataset[Row]
— Datasets of Rows.
Note
|
DataFrame type is a mere type alias for Dataset[Row] that expects a Encoder[Row] available in scope which is indeed RowEncoder itself.
|
RowEncoder
is an object
in Scala with apply and other factory methods.
RowEncoder
can create ExpressionEncoder[Row]
from a schema (using apply method).
RowEncoder
object belongs to org.apache.spark.sql.catalyst.encoders
package.
Creating ExpressionEncoder of Rows — apply
method
apply(schema: StructType): ExpressionEncoder[Row]
apply
builds ExpressionEncoder of Row, i.e. ExpressionEncoder[Row]
, from the input StructType (as schema
).
Internally, apply
creates a BoundReference
for the Row type and returns a ExpressionEncoder[Row]
for the input schema
, a CreateNamedStruct
serializer (using serializerFor
internal method), a deserializer for the schema, and the Row
type.
serializerFor
Internal Method
serializerFor(inputObject: Expression, inputType: DataType): Expression
serializerFor
creates an Expression
that is assumed to be CreateNamedStruct
.
serializerFor
takes the input inputType
and:
-
Returns the input
inputObject
as is for native types, i.e.NullType
,BooleanType
,ByteType
,ShortType
,IntegerType
,LongType
,FloatType
,DoubleType
,BinaryType
,CalendarIntervalType
.CautionFIXME What does being native type mean? -
For
UserDefinedType
s, it takes the UDT class from theSQLUserDefinedType
annotation orUDTRegistration
object and returns an expression withInvoke
to callserialize
method on aNewInstance
of the UDT class. -
For
TimestampType
, it returns an expression with a StaticInvoke to callfromJavaTimestamp
onDateTimeUtils
class. -
…FIXME
Caution
|
FIXME Describe me. |
StaticInvoke
NonSQLExpression
case class StaticInvoke(
staticObject: Class[_],
dataType: DataType,
functionName: String,
arguments: Seq[Expression] = Nil,
propagateNull: Boolean = true) extends NonSQLExpression
StaticInvoke
is an Expression
with no SQL representation that represents a static method call in Scala or Java. It supports generating Java code to evaluate itself.
StaticInvoke
invokes functionName
static method on staticObject
object with arguments
input parameters to produce a value of dataType
type. If propagateNull
is enabled and any of arguments
is null
, null
is the result (without calling functionName
function).
StaticInvoke
is used in RowEncoder and Java’s encoders.
import org.apache.spark.sql.types._
val schema = StructType(
StructField("id", LongType, nullable = false) ::
StructField("name", StringType, nullable = false) :: Nil)
import org.apache.spark.sql.catalyst.encoders.RowEncoder
val encoder = RowEncoder(schema)
scala> encoder.serializer
res0: Seq[org.apache.spark.sql.catalyst.expressions.Expression] = List(validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, id), LongType) AS id#69L, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 1, name), StringType), true) AS name#70)