pyspark.sql.DataFrameReader#
- class pyspark.sql.DataFrameReader(spark)[source]#
Interface used to load a
DataFrame
from external storage systems (e.g. file systems, key-value stores, etc). UseSparkSession.read
to access this.New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
Methods
csv
(path[, schema, sep, encoding, quote, ...])Loads a CSV file and returns the result as a
DataFrame
.format
(source)Specifies the input data source format.
jdbc
(url, table[, column, lowerBound, ...])Construct a
DataFrame
representing the database table namedtable
accessible via JDBC URLurl
and connectionproperties
.json
(path[, schema, primitivesAsString, ...])Loads JSON files and returns the results as a
DataFrame
.load
([path, format, schema])Loads data from a data source and returns it as a
DataFrame
.option
(key, value)Adds an input option for the underlying data source.
options
(**options)Adds input options for the underlying data source.
orc
(path[, mergeSchema, pathGlobFilter, ...])Loads ORC files, returning the result as a
DataFrame
.parquet
(*paths, **options)Loads Parquet files, returning the result as a
DataFrame
.schema
(schema)Specifies the input schema.
table
(tableName)Returns the specified table as a
DataFrame
.text
(paths[, wholetext, lineSep, ...])Loads text files and returns a
DataFrame
whose schema starts with a string column named "value", and followed by partitioned columns if there are any.xml
(path[, rowTag, schema, ...])Loads a XML file and returns the result as a
DataFrame
.