import org.apache.spark.sql.Row
Row
Row is a data abstraction of an ordered collection of fields that can be accessed by an ordinal / an index (aka generic access by ordinal), a name (aka native primitive access) or using Scala’s pattern matching. A Row instance may or may not have a schema.
The traits of Row:
-
lengthorsize-Rowknows the number of elements (columns). -
schema-Rowknows the schema
Row belongs to org.apache.spark.sql.Row package.
Field Access by Index — apply and get methods
Fields of a Row instance can be accessed by index (starting from 0) using apply or get.
scala> val row = Row(1, "hello")
row: org.apache.spark.sql.Row = [1,hello]
scala> row(1)
res0: Any = hello
scala> row.get(1)
res1: Any = hello
|
Note
|
Generic access by ordinal (using apply or get) returns a value of type Any.
|
Get Field As Type — getAs method
You can query for fields with their proper types using getAs with an index
val row = Row(1, "hello")
scala> row.getAs[Int](0)
res1: Int = 1
scala> row.getAs[String](1)
res2: String = hello
|
Note
|
|
Schema
A Row instance can have a schema defined.
|
Note
|
Unless you are instantiating Row yourself (using Row Object), a Row has always a schema.
|
|
Note
|
It is RowEncoder to take care of assigning a schema to a Row when toDF on a Dataset or when instantiating DataFrame through DataFrameReader.
|
Row Object
Row companion object offers factory methods to create Row instances from a collection of elements (apply), a sequence of elements (fromSeq) and tuples (fromTuple).
scala> Row(1, "hello")
res0: org.apache.spark.sql.Row = [1,hello]
scala> Row.fromSeq(Seq(1, "hello"))
res1: org.apache.spark.sql.Row = [1,hello]
scala> Row.fromTuple((0, "hello"))
res2: org.apache.spark.sql.Row = [0,hello]
Row object can merge Row instances.
scala> Row.merge(Row(1), Row("hello"))
res3: org.apache.spark.sql.Row = [1,hello]
It can also return an empty Row instance.
scala> Row.empty == Row()
res4: Boolean = true
Pattern Matching on Row
Row can be used in pattern matching (since Row Object comes with unapplySeq).
scala> Row.unapplySeq(Row(1, "hello"))
res5: Some[Seq[Any]] = Some(WrappedArray(1, hello))
Row(1, "hello") match { case Row(key: Int, value: String) =>
key -> value
}