val spark: SparkSession = ...
val schema = spark.read
.format("csv")
.option("header", true)
.option("inferSchema", true)
.load("csv-logs/*.csv")
.schema
val df = spark.readStream
.format("csv")
.schema(schema)
.load("csv-logs/*.csv")
DataStreamReader
DataStreamReader
is an interface for reading streaming data in DataFrame from data sources with specified format, schema and options.
DataStreamReader
offers support for the built-in formats: json, csv, parquet, text. parquet
format is the default data source as configured using spark.sql.sources.default setting.
DataStreamReader
is available using SparkSession.readStream method.
format
format(source: String): DataStreamReader
format
specifies the source
format of the streaming data source.
schema
schema(schema: StructType): DataStreamReader
schema
specifies the schema
of the streaming data source.
option Methods
option(key: String, value: String): DataStreamReader
option(key: String, value: Boolean): DataStreamReader
option(key: String, value: Long): DataStreamReader
option(key: String, value: Double): DataStreamReader
option
family of methods specifies additional options to a streaming data source.
There is support for values of String
, Boolean
, Long
, and Double
types for user convenience, and internally are converted to String
type.
Note
|
You can also set options in bulk using options method. You have to do the type conversion yourself, though. |
options
options(options: scala.collection.Map[String, String]): DataStreamReader
options
method allows specifying one or many options of the streaming input data source.
Note
|
You can also set options one by one using option method. |
load Methods
load(): DataFrame
load(path: String): DataFrame (1)
-
Specifies
path
option before passing calls toload()
load
loads streaming input data as DataFrame.
Internally, load
creates a DataFrame
from the current SparkSession and a StreamingRelation (of a DataSource based on schema, format, and options).
Built-in Formats
json(path: String): DataFrame
csv(path: String): DataFrame
parquet(path: String): DataFrame
text(path: String): DataFrame
DataStreamReader
can load streaming data from data sources of the following formats:
-
json
-
csv
-
parquet
-
text
The methods simply pass calls to format
followed by load(path)
.