pyspark.pandas.window.Expanding.sum#

Expanding.sum()[source]#

Calculate expanding summation of given DataFrame or Series.

Note

the current implementation of this API uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.

Returns
Series or DataFrame

Same type as the input, with the same index, containing the expanding summation.

See also

pyspark.pandas.Series.expanding

Calling object with Series data.

pyspark.pandas.DataFrame.expanding

Calling object with DataFrames.

pyspark.pandas.Series.sum

Reducing sum for Series.

pyspark.pandas.DataFrame.sum

Reducing sum for DataFrame.

Examples

>>> s = ps.Series([1, 2, 3, 4, 5])
>>> s
0    1
1    2
2    3
3    4
4    5
dtype: int64
>>> s.expanding(3).sum()
0     NaN
1     NaN
2     6.0
3    10.0
4    15.0
dtype: float64

For DataFrame, each expanding summation is computed column-wise.

>>> df = ps.DataFrame({"A": s.to_numpy(), "B": s.to_numpy() ** 2})
>>> df
   A   B
0  1   1
1  2   4
2  3   9
3  4  16
4  5  25
>>> df.expanding(3).sum()
      A     B
0   NaN   NaN
1   NaN   NaN
2   6.0  14.0
3  10.0  30.0
4  15.0  55.0