rolling_mean returns the wrong value when there are too many rows #21358

Jack0Chan · 2025-02-20T08:35:17Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

# 10_000 rows
df = pl.read_csv('/home/guangyao/work/bug_of_polars_rolling/debug_data.csv')

# Exp1: pow3 and rolling mean
rolling_version_wrong = df.select(pl.all().pow(3).rolling_mean(window_size=100, min_samples=20))
print(rolling_version_wrong[-1].item()) # get 0.806879

# Exp2: slice [-1000:], pow3 and rolling mean
rolling_version_correct = df[-1000:].select(pl.all().pow(3).rolling_mean(window_size=100, min_samples=20))
print(rolling_version_correct[-1].item()) # get 0.598021

# Exp3: slice the last 100 rows, pow3 and mean
direct_version = df[-100:].select(pl.all().pow(3).mean())
print(direct_version.item()) # get 0.598021

# Exp4: remove pow3, all get the same value 0.840279 at the last row
rolling_version1 = df.select(pl.all().rolling_mean(window_size=100, min_samples=20, center=False))
rolling_version2 = df[-1000:].select(pl.all().rolling_mean(window_size=100, min_samples=20))
direct_version = df[-100:].select(pl.all().mean())

Log output

0.8068792806886891
0.5980209629812755
0.598020962981293

Issue description

While calculating rolling_mean after pow(3) (Exp1), I got a value significantly different from direct calculation (Exp3), if the number of rows is to large.
Originally, there are 5_299_200 number of rows, and I got -4122.788527 instead of the correct value 0.598021. In fact, I got different wrong values for different number of rows.
If change pow(3) to pow(2), polars performs as exected.

Expected behavior

Exp1 should have the same result as Exp3.

Installed versions

--------Version info---------
Polars:              1.19.0
Index type:          UInt32
Platform:            Linux-6.2.0-26-generic-x86_64-with-glibc2.35
Python:              3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.35.53
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.9.0
gevent               <not installed>
google.auth          2.38.0
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                1.26.4
openpyxl             3.1.5
pandas               2.2.3
pyarrow              17.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.32
torch                2.6.0+cu124
xlsx2csv             <not installed>
xlsxwriter           3.2.2

The text was updated successfully, but these errors were encountered:

MarcoGorelli · 2025-02-20T10:02:50Z

thanks @Jack0Chan - to expedite resolution, could you make a reproducible example please? https://matthewrocklin.com/minimal-bug-reports.html

MarcoGorelli · 2025-02-20T10:24:15Z

Having downloaded your data, it seems that the change happens between:

df[-3618:].select(pl.all().pow(3).rolling_mean(window_size=100, min_samples=20)).tail(1) # incorrect
df[-3617:].select(pl.all().pow(3).rolling_mean(window_size=100, min_samples=20)).tail(1) # correct

And that's where there's a large negative number:

In [107]: df[-3618:]
Out[107]:
shape: (3_618, 1)
┌───────────────┐
│ debug_data    │
│ ---           │
│ f64           │
╞═══════════════╡
│ -33802.276428 │
│ 0.704235      │
│ 0.701813      │
│ 0.701813      │
│ 0.701813      │
│ …             │
│ 0.817196      │
│ 0.817196      │
│ 0.817196      │
│ 0.811929      │
│ 0.815339      │
└───────────────┘

You're correct that this shouldn't influence the results - the rolling mean for the last row should only depend on the last 100 elements

Also, this reproduces even without min_counts=20

mcrumiller · 2025-02-20T14:18:44Z

@MarcoGorelli I recognize this error as being the same one from #21099 (I think). The issue is that the rolling sum/mean kernels operate in O(n) time by maintaining the sum of the current window, incrementing that sum by new values entering the window and decrementing by values leaving the window with each iteration. The result is that floating point errors accumulate throughout the entire process. This is usually fine, except for when we see huge changes in magnitude, as is the case here.

polars/crates/polars-arrow/src/legacy/kernels/rolling/nulls/sum.rs

Lines 57 to 91 in fb28c07

    
           unsafe fn update(&mut self, start: usize, end: usize) -> Option<T> { 
        
               // if we exceed the end, we have a completely new window 
        
               // so we recompute 
        
               let recompute_sum = if start >= self.last_end { 
        
                   true 
        
               } else { 
        
                   // remove elements that should leave the window 
        
                   let mut recompute_sum = false; 
        
                   for idx in self.last_start..start { 
        
                       // SAFETY: 
        
                       // we are in bounds 
        
                       let valid = self.validity.get_bit_unchecked(idx); 
        
                       if valid { 
        
                           let leaving_value = self.slice.get_unchecked(idx); 
        
                           // if the leaving value is nan we need to recompute the window 
        
                           if T::is_float() && !leaving_value.is_finite() { 
        
                               recompute_sum = true; 
        
                               break; 
        
                           } 
        
                           self.sum = self.sum.map(|v| v - *leaving_value) 
        
                       } else { 
        
                           // null value leaving the window 
        
                           self.null_count -= 1; 
        
                           // self.sum is None and the leaving value is None 
        
                           // if the entering value is valid, we might get a new sum. 
        
                           if self.sum.is_none() { 
        
                               recompute_sum = true; 
        
                               break; 
        
                           } 
        
                       } 
        
                   } 
        
                   recompute_sum 
        
               };

Line 77 is where the issue here is. This also happens in the no-nulls kernel.

Also--in this particular case, where you noticed the big change in magnitude from index -3618 to -3619, after the pow(3) that change is orders of magnitude larger:

>>> idx = 6381
>>> s[idx:idx+3]
shape: (3,)
Series: 'debug_data' [f64]
[
        -34321.028632
        -33802.276428
        0.704235
]
>>> s.pow(3)[idx:idx+3]
Series: 'debug_data' [f64]
[
        -4.0428e13
        -3.8622e13
        0.349262
]

I'm not sure what a good fix here is.

MarcoGorelli · 2025-02-20T14:40:53Z

thanks @mcrumiller !

removing "high prio" then, given the size of the values, floating point errors seem expected?

mcrumiller · 2025-02-20T14:49:14Z

Someone with more experience like @orlp or @ritchie46 should verify my diagnosis, and whether it's reasonable/acceptable for the rolling implementation to differ somewhat substantially from the direct calculation in the face of floating point issues.

This is the second time in a short while that people have posted issues concerning precision with the rolling calculation. If it continues to be a point of contention for users it might be worth addressing. One option may be to include a heuristic in the update function that triggers recompute_sum = true if the leaving_value and the current sum are different enough that these issues may arise.

p12r34c56 · 2025-02-21T08:13:19Z

I have run into a similar problem where the error of the last rows in the result of rolling mean/sum goes way too big.
For reference, pandas uses kahan summation for rolling.sum/mean aggregations, which looks to me like an easy fix with minimal performance impact though I am not sure how exactly rolling mean is implemented in polars.

MarcoGorelli · 2025-02-21T09:13:55Z

thanks @p12r34c56 - indeed, pandas does return 0.598021:

In [18]: df = pd.read_csv('debug_data.csv')

In [19]: df['debug_data'].pow(3).rolling(100, min_periods=20).mean().tail()
Out[19]:
9567    0.602615
9568    0.601556
9569    0.600425
9570    0.599189
9571    0.598021
Name: debug_data, dtype: float64

duckdb:

In [25]: duckdb.sql("""
    ...: with cte as (from df select *, row_number() over () -1 as index)
    ...: from cte
    ...: select
    ...:     case when (count(debug_data) over rolling_window)>=20
    ...:          then mean(pow(debug_data, 3)) over rolling_window
    ...:          else null
    ...:          end as debug_data
    ...: window rolling_window as (order by index rows between 99 preceding and current row)
    ...: order by index
    ...: """).pl()
Out[25]:
shape: (9_572, 1)
┌────────────┐
│ debug_data │
│ ---        │
│ f64        │
╞════════════╡
│ null       │
│ null       │
│ null       │
│ null       │
│ null       │
│ …          │
│ 0.602615   │
│ 0.601556   │
│ 0.600425   │
│ 0.599189   │
│ 0.598021   │
└────────────┘

Jack0Chan added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 20, 2025

MarcoGorelli added P-high Priority: high A-rolling and removed needs triage Awaiting prioritization by a maintainer labels Feb 20, 2025

MarcoGorelli removed the P-high Priority: high label Feb 20, 2025

ritchie46 mentioned this issue Feb 23, 2025

fix: Use Kahan summation for rolling sum kernels. Fix numerical stability issues #21413

Merged

ritchie46 closed this as completed in #21413 Feb 23, 2025

c-peters added the accepted Ready for implementation label Feb 24, 2025

c-peters assigned ritchie46 Feb 24, 2025

c-peters added this to Backlog Feb 24, 2025

c-peters moved this to Done in Backlog Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rolling_mean returns the wrong value when there are too many rows #21358

rolling_mean returns the wrong value when there are too many rows #21358

Jack0Chan commented Feb 20, 2025 •

edited by MarcoGorelli

Loading

MarcoGorelli commented Feb 20, 2025

MarcoGorelli commented Feb 20, 2025 •

edited

Loading

mcrumiller commented Feb 20, 2025 •

edited

Loading

MarcoGorelli commented Feb 20, 2025

mcrumiller commented Feb 20, 2025 •

edited

Loading

p12r34c56 commented Feb 21, 2025

MarcoGorelli commented Feb 21, 2025

rolling_mean returns the wrong value when there are too many rows #21358

rolling_mean returns the wrong value when there are too many rows #21358

Comments

Jack0Chan commented Feb 20, 2025 • edited by MarcoGorelli Loading

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

MarcoGorelli commented Feb 20, 2025

MarcoGorelli commented Feb 20, 2025 • edited Loading

mcrumiller commented Feb 20, 2025 • edited Loading

MarcoGorelli commented Feb 20, 2025

mcrumiller commented Feb 20, 2025 • edited Loading

p12r34c56 commented Feb 21, 2025

MarcoGorelli commented Feb 21, 2025

Jack0Chan commented Feb 20, 2025 •

edited by MarcoGorelli

Loading

MarcoGorelli commented Feb 20, 2025 •

edited

Loading

mcrumiller commented Feb 20, 2025 •

edited

Loading

mcrumiller commented Feb 20, 2025 •

edited

Loading