-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rolling_means gives negative values from polars for some reason #11146
Comments
More simplify reproduce: pl.Series( [1.51, 1, 0, 0]).rolling_sum(window_size=2, min_periods=1) Series: '' [f64]
[
1.51
2.51
1.0
-2.2204e-16
] This may be related to the accuracy of floating-point calculations. |
interesting "bug", good catch! First thought ofc is the classic floating point problems. Probably some performance optimization that uses previous calculations in following ones. ser = pl.Series(
[
0,
1,
1.51, # no precise float representation! (actual value: 1.5099999904632568359375)
1,
0,
0,
]
).rolling_mean(window_size=2, min_periods=1)
[
0.0
0.5
1.255
1.255
0.5
-1.1102e-16
]
for val in ser:
print(f'{val:.20f}')
0.00000000000000000000 # correct
0.50000000000000000000 # correct
1.25499999999999989342 # floating point error
1.25499999999999989342 # floating point error
0.49999999999999988898 # NOT correct (1 + 0) / 2 (should be precise)
-0.00000000000000011102 # NOT correct (0 + 0) / 2 (should be precise) Is this an example where the performance improvements justify the small loss in accuracy? |
I am inclined to say this may be expected from floating point arithmetic. We could however take a look if we can improve numerical stability without paying (too much) in performance. |
This is how floating point arithmetic works. :) This is normal. |
This is not entirely true! ;)
I assume the inaccuracy comes from some kind of optimization. I am asking because pandas does not have this "problem"! import pandas as pd
ser = pd.Series(
data=[
0,
1,
1.51, # no precise float representation! (actual value: 1.5099999904632568359375)
1,
0,
0,
]
)
ser.rolling(window=2).mean().apply(lambda x: f'{x:.20f}')
0 nan
1 0.50000000000000000000
2 1.25499999999999989342
3 1.25499999999999989342
4 0.50000000000000000000
5 0.00000000000000000000
dtype: object |
It's probably related to float32 vs. float64. When I defined the series using pl.Float32, I get results similar to Pandas. |
I don't think float32 matters. here is the test
|
This code works fine for me using
I get:
|
Same thing happens when I use rolling_sum |
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
I have other examples with large data set, but this is the result after I strip off most of the code and with very small data-set now. I can't tell why as rolling_mean give me negative values when we try to use that to calculate mean for our training data, especially there is heading 0s and trailing 0s.
Log output
Issue description
rolling_mean method give negative values for some special data cases. especially when there is heading 0s and trailing 0s
Expected behavior
all 0 or positive values
Installed versions
The text was updated successfully, but these errors were encountered: