DataStreamReader

DataStreamReader is an interface for reading streaming data in DataFrame from data sources with specified format, schema and options.

DataStreamReader offers support for the built-in formats: json, csv, parquet, text. parquet format is the default data source as configured using spark.sql.sources.default setting.

DataStreamReader is available using SparkSession.readStream method.

val spark: SparkSession = ...

val schema = spark.read
  .format("csv")
  .option("header", true)
  .option("inferSchema", true)
  .load("csv-logs/*.csv")
  .schema

val df = spark.readStream
  .format("csv")
  .schema(schema)
  .load("csv-logs/*.csv")

format

format(source: String): DataStreamReader

format specifies the source format of the streaming data source.

schema

schema(schema: StructType): DataStreamReader

schema specifies the schema of the streaming data source.

option Methods

option(key: String, value: String): DataStreamReader
option(key: String, value: Boolean): DataStreamReader
option(key: String, value: Long): DataStreamReader
option(key: String, value: Double): DataStreamReader

option family of methods specifies additional options to a streaming data source.

There is support for values of String, Boolean, Long, and Double types for user convenience, and internally are converted to String type.

Note
You can also set options in bulk using options method. You have to do the type conversion yourself, though.

options

options(options: scala.collection.Map[String, String]): DataStreamReader

options method allows specifying one or many options of the streaming input data source.

Note
You can also set options one by one using option method.

load Methods

load(): DataFrame
load(path: String): DataFrame (1)
  1. Specifies path option before passing calls to load()

load loads streaming input data as DataFrame.

Internally, load creates a DataFrame from the current SparkSession and a StreamingRelation (of a DataSource based on schema, format, and options).

Built-in Formats

json(path: String): DataFrame
csv(path: String): DataFrame
parquet(path: String): DataFrame
text(path: String): DataFrame

DataStreamReader can load streaming data from data sources of the following formats:

  • json

  • csv

  • parquet

  • text

The methods simply pass calls to format followed by load(path).

results matching ""

    No results matching ""