DataStreamWriter

DataFrameWriter is a part of Structured Streaming API as of Spark 2.0 that is responsible for writing the output of streaming queries to sinks and hence starting their execution.

val people: Dataset[Person] = ...

import org.apache.spark.sql.streaming.ProcessingTime
import scala.concurrent.duration._
import org.apache.spark.sql.streaming.OutputMode.Complete
df.writeStream
  .queryName("textStream")
  .outputMode(Complete)
  .trigger(ProcessingTime(10.seconds))
  .format("console")
  .start

queryName to set the name of a query
outputMode to specify output mode.
trigger to set the Trigger for a stream query.
start to start continuous writing to a sink.

Specifying Output Mode — `outputMode` method

outputMode(outputMode: OutputMode): DataStreamWriter[T]

outputMode specifies output mode of a streaming Dataset which is what gets written to a streaming sink when there is a new data available.

Currently, the following output modes are supported:

OutputMode.Append — only the new rows in the streaming dataset will be written to a sink.
OutputMode.Complete — entire streaming dataset (with all the rows) will be written to a sink every time there are updates. It is supported only for streaming queries with aggregations.

Setting Query Name — `queryName` method

queryName(queryName: String): DataStreamWriter[T]

queryName sets the name of a streaming query.

Internally, it is just an additional option with the key queryName.

Setting How Often to Execute Streaming Query — `trigger` method

trigger(trigger: Trigger): DataStreamWriter[T]

trigger method sets the time interval of the trigger (batch) for a streaming query.

Note	`Trigger` specifies how often results should be produced by a StreamingQuery. See Trigger.

The default trigger is ProcessingTime(0L) that runs a streaming query as often as possible.

Tip	Consult Trigger to learn about `Trigger` and `ProcessingTime` types.

Starting Continuous Writing to Sink — `start` methods

start(): StreamingQuery
start(path: String): StreamingQuery  (1)

Sets path option to path and calls start()

start methods start a streaming query and return a StreamingQuery object to continually write data.

Note	Whether or not you have to specify `path` option depends on the DataSource in use.

Recognized options:

queryName is the name of active streaming query.
checkpointLocation is the directory for checkpointing.

Note	Define options using option or options methods.

DataStreamWriter