pyspark.sql.functions.kll_sketch_agg_double#

pyspark.sql.functions.kll_sketch_agg_double(col, k=None)[source]#

Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535).

New in version 4.1.0.

Parameters
colColumn or column name

The column containing double values to aggregate

kColumn or int, optional

The k parameter that controls size and accuracy (default 200, range 8-65535)

Returns
Column

The binary representation of the KllDoublesSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([1.0,2.0,3.0,4.0,5.0], "DOUBLE")
>>> result = df.agg(sf.kll_sketch_agg_double("value")).first()[0]
>>> result is not None and len(result) > 0
True