KafkaSourceProvider

KafkaSourceProvider is a StreamSourceProvider for KafkaSource.

KafkaSourceProvider is a DataSourceRegister, too.

The short name of the data source is kafka.

KafkaSourceProvider requires the following options (that you can set using option method):

  1. Exactly one option for subscribe, subscribepattern or assign

  2. kafka.bootstrap.servers (corresponds to bootstrap.servers)

Note
KafkaSourceProvider is part of spark-sql-kafka-0-10 Library Dependency.
// Run spark-shell with spark-sql-kafka-0-10_2.11 module

spark.readStream
  .format("kafka")
  .option("subscribe", "topic")
  .option("kafka.bootstrap.servers", "localhost:9092")
  .load
  .writeStream
  .format("console")
  .start

spark-sql-kafka-0-10 Library Dependency

The new structured streaming API for Kafka is part of the spark-sql-kafka-0-10 artifact. Add the following dependency to sbt project to use the streaming integration:

libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.0.1"
Tip

spark-sql-kafka-0-10 module is not included in the CLASSPATH of spark-shell so you have to start it with --packages command-line option.

./bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0-SNAPSHOT
Note
Replace 2.0.1 or 2.1.0-SNAPSHOT with available version as found at The Central Repository’s search.

results matching ""

    No results matching ""