import org.apache.spark.sql.DataFrameWriter
val nums: Dataset[Long] = ...
val writer: DataFrameWriter[Row] = nums.writeDataFrameWriter
DataFrameWriter is an interface to persist a Dataset to an external storage system in a batch fashion.
| Tip | Read DataStreamWriter for streamed writing. | 
You use write method on a Dataset to access DataFrameWriter.
DataFrameWriter has a direct support for many file formats, JDBC databases and an interface to plug in new formats. It assumes parquet as the default data source that you can change using spark.sql.sources.default setting or format method.
// see above for writer definition
// Save dataset in Parquet format
writer.save(path = "nums")
// Save dataset in JSON format
writer.format("json").save(path = "nums-json")In the end, you trigger the actual saving of the content of a Dataset using save method.
writer.save| Note | Interestingly, a DataFrameWriteris really a type constructor in Scala. It keeps a reference to a sourceDataFrameduring its lifecycle (starting right from the moment it was created). | 
Internal State
DataFrameWriter uses the following mutable attributes to build a properly-defined write specification for insertInto, saveAsTable, and save:
| Attribute | Setters | 
|---|---|
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | 
 saveAsTable Method
saveAsTable(tableName: String): UnitsaveAsTable saves the content of a DataFrame as the tableName table.
First, tableName is parsed to an internal table identifier. saveAsTable then checks whether the table exists or not and uses save mode to decide what to do.
saveAsTable uses the SessionCatalog for the current session.
| Does table exist? | Save Mode | Behaviour | 
|---|---|---|
| yes | 
 | Do nothing | 
| yes | 
 | Throws a  | 
| anything | anything | It creates a  | 
val ints = 0 to 9 toDF
val options = Map("path" -> "/tmp/ints")
ints.write.options(options).saveAsTable("ints")
sql("show tables").show Persisting DataFrame — save Method
save(): Unitsave saves the content of a Dataset to…FIXME
Internally, save first checks whether the DataFrame is not bucketed.
| Caution | FIXME What does bucketingmean? | 
save then creates a DataSource (for the source) and calls write on it.
| Note | saveusessource,partitioningColumns,extraOptions, andmodeinternal attributes directly. They are specified through the API. | 
 jdbc Method
jdbc(url: String, table: String, connectionProperties: Properties): Unitjdbc method saves the content of the DataFrame to an external database table via JDBC.
You can use mode to control save mode, i.e. what happens when an external table exists when save is executed.
It is assumed that the jdbc save pipeline is not partitioned and bucketed.
All options are overriden by the input connectionProperties.
The required options are:
- 
driverwhich is the class name of the JDBC driver (that is passed to Spark’s ownDriverRegistry.registerand later used toconnect(url, properties)).
When table exists and the override save mode is in use, DROP TABLE table is executed.
It creates the input table (using CREATE TABLE table (schema) where schema is the schema of the DataFrame).
 bucketBy Method
| Caution | FIXME | 
 Specifying Save Mode — mode Method
mode(saveMode: String): DataFrameWriter[T]
mode(saveMode: SaveMode): DataFrameWriter[T]You can control the behaviour of write using mode method, i.e. what happens when an external file or table exist when save is executed.
- 
SaveMode.Ignoreor
- 
SaveMode.ErrorIfExistsor
- 
SaveMode.Overwriteor
 Writer Configuration — option and options Methods
| Caution | FIXME | 
Writing DataFrames to Files
| Caution | FIXME | 
 Specifying Alias or Fully-Qualified Class Name of DataSource — format Method
| Caution | FIXME Compare to DataFrameReader. | 
 insertInto Method
| Caution | FIXME |