public final class DataFrameWriter
extends java.lang.Object
DataFrame to external storage systems (e.g. file systems,
key-value stores, etc). Use DataFrame.write to access this.
| Modifier and Type | Method and Description |
|---|---|
DataFrameWriter |
format(java.lang.String source)
Specifies the underlying output data source.
|
void |
insertInto(java.lang.String tableName)
Inserts the content of the
DataFrame to the specified table. |
void |
jdbc(java.lang.String url,
java.lang.String table,
java.util.Properties connectionProperties)
Saves the content of the
DataFrame to a external database table via JDBC. |
void |
json(java.lang.String path)
Saves the content of the
DataFrame in JSON format at the specified path. |
DataFrameWriter |
mode(SaveMode saveMode)
Specifies the behavior when data or table already exists.
|
DataFrameWriter |
mode(java.lang.String saveMode)
Specifies the behavior when data or table already exists.
|
DataFrameWriter |
option(java.lang.String key,
java.lang.String value)
Adds an output option for the underlying data source.
|
DataFrameWriter |
options(scala.collection.Map<java.lang.String,java.lang.String> options)
(Scala-specific) Adds output options for the underlying data source.
|
DataFrameWriter |
options(java.util.Map<java.lang.String,java.lang.String> options)
Adds output options for the underlying data source.
|
void |
orc(java.lang.String path)
Saves the content of the
DataFrame in ORC format at the specified path. |
void |
parquet(java.lang.String path)
Saves the content of the
DataFrame in Parquet format at the specified path. |
DataFrameWriter |
partitionBy(scala.collection.Seq<java.lang.String> colNames)
Partitions the output by the given columns on the file system.
|
DataFrameWriter |
partitionBy(java.lang.String... colNames)
Partitions the output by the given columns on the file system.
|
void |
save()
Saves the content of the
DataFrame as the specified table. |
void |
save(java.lang.String path)
Saves the content of the
DataFrame at the specified path. |
void |
saveAsTable(java.lang.String tableName)
Saves the content of the
DataFrame as the specified table. |
void |
text(java.lang.String path)
Saves the content of the
DataFrame in a text file at the specified path. |
public DataFrameWriter partitionBy(java.lang.String... colNames)
This was initially applicable for Parquet but in 1.5+ covers JSON, text, ORC and avro as well.
colNames - (undocumented)public DataFrameWriter mode(SaveMode saveMode)
SaveMode.Overwrite: overwrite the existing data.
- SaveMode.Append: append the data.
- SaveMode.Ignore: ignore the operation (i.e. no-op).
- SaveMode.ErrorIfExists: default option, throw an exception at runtime.
saveMode - (undocumented)public DataFrameWriter mode(java.lang.String saveMode)
overwrite: overwrite the existing data.
- append: append the data.
- ignore: ignore the operation (i.e. no-op).
- error: default option, throw an exception at runtime.
saveMode - (undocumented)public DataFrameWriter format(java.lang.String source)
source - (undocumented)public DataFrameWriter option(java.lang.String key, java.lang.String value)
key - (undocumented)value - (undocumented)public DataFrameWriter options(scala.collection.Map<java.lang.String,java.lang.String> options)
options - (undocumented)public DataFrameWriter options(java.util.Map<java.lang.String,java.lang.String> options)
options - (undocumented)public DataFrameWriter partitionBy(scala.collection.Seq<java.lang.String> colNames)
This was initially applicable for Parquet but in 1.5+ covers JSON, text, ORC and avro as well.
colNames - (undocumented)public void save(java.lang.String path)
DataFrame at the specified path.
path - (undocumented)public void save()
DataFrame as the specified table.
public void insertInto(java.lang.String tableName)
DataFrame to the specified table. It requires that
the schema of the DataFrame is the same as the schema of the table.
Because it inserts data to an existing table, format or options will be ignored.
tableName - (undocumented)public void saveAsTable(java.lang.String tableName)
DataFrame as the specified table.
In the case the table already exists, behavior of this function depends on the
save mode, specified by the mode function (default to throwing an exception).
When mode is Overwrite, the schema of the DataFrame does not need to be
the same as that of the existing table.
When mode is Append, the schema of the DataFrame need to be
the same as that of the existing table, and format or options will be ignored.
When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input
path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC
and Parquet), the table is persisted in a Hive compatible format, which means other systems
like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL
specific format.
tableName - (undocumented)public void jdbc(java.lang.String url,
java.lang.String table,
java.util.Properties connectionProperties)
DataFrame to a external database table via JDBC. In the case the
table already exists in the external database, behavior of this function depends on the
save mode, specified by the mode function (default to throwing an exception).
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
url - JDBC database url of the form jdbc:subprotocol:subnametable - Name of the table in the external database.connectionProperties - JDBC database connection arguments, a list of arbitrary string
tag/value. Normally at least a "user" and "password" property
should be included.
public void json(java.lang.String path)
DataFrame in JSON format at the specified path.
This is equivalent to:
format("json").save(path)
path - (undocumented)public void parquet(java.lang.String path)
DataFrame in Parquet format at the specified path.
This is equivalent to:
format("parquet").save(path)
path - (undocumented)public void orc(java.lang.String path)
DataFrame in ORC format at the specified path.
This is equivalent to:
format("orc").save(path)
path - (undocumented)public void text(java.lang.String path)
DataFrame in a text file at the specified path.
The DataFrame must have only one column that is of string type.
Each row becomes a new line in the output file. For example:
// Scala:
df.write.text("/path/to/output")
// Java:
df.write().text("/path/to/output")
path - (undocumented)