BaseRelation

BaseRelation works in a SQLContext with a data of a given schema (as StructType). BaseRelation knows its size (as sizeInBytes), whether it needs a conversion, and computes the list of Filter that this data source may not be able to handle.

Table 1. BaseRelation Methods
Name Behaviour

sqlContext

Returns the current SQLContext.

schema

Returns the current StructType.

sizeInBytes

Computes an estimated size of this relation in bytes.

needConversion

Whether the relation needs a conversion of the objects in Row to internal representation.

unhandledFilters

Computes the list of Filters that this data source may not be able to handle.

Note
A "data source" and "relation" appear as synonyms.

BaseRelation is an abstract class in org.apache.spark.sql.sources package.

HadoopFsRelation

case class HadoopFsRelation(
  location: FileIndex,
  partitionSchema: StructType,
  dataSchema: StructType,
  bucketSpec: Option[BucketSpec],
  fileFormat: FileFormat,
  options: Map[String, String])(val sparkSession: SparkSession)
extends BaseRelation with FileRelation

HadoopFsRelation is a BaseRelation in a SparkSession (through which it gets to the current SQLContext).

HadoopFsRelation requires a schema (as StructType) that it expands with the input partitionSchema schema.

sizeInBytes and inputFiles (from the base BaseRelation) use the input FileIndex to compute the size and input files, respectively.

results matching ""

    No results matching ""