-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: Change default offset
in group_by_dynamic
from 'negative every
' to 'zero'
#16658
Conversation
6d7f7c3
to
21ad10d
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16658 +/- ##
==========================================
- Coverage 81.52% 81.52% -0.01%
==========================================
Files 1414 1414
Lines 186398 186398
Branches 3014 3014
==========================================
- Hits 151957 151953 -4
- Misses 33911 33914 +3
- Partials 530 531 +1 ☔ View full report in Codecov by Sentry. |
I just talked this over with an ml engineer, and it should be possible to deprecate this in such as way that it's not a sudden breaking change, and won't be too annoying to most users:
|
yes, please! 😎
|
Sounds good! 👌 |
Thanks both for your feedback! I was slightly torn on how to go about this one:
Still though:
so...will make a separate PR to start the deprecation cycle |
I think that if we want to do this, we just should now. Rather sooner, than later. |
offset
in group_by_dynamic from "negative every
" to "zero"offset
in group_by_dynamic from "negative every
" to "zero"
offset
in group_by_dynamic from "negative every
" to "zero"offset
in group_by_dynamic
from "negative every
" to "zero"
offset
in group_by_dynamic
from "negative every
" to "zero"offset
in group_by_dynamic
from 'negative every
' to 'zero'
Background
Currently, the start of the first window in
group_by_dynamic
is determined as follows:every
offset
offset
until the earliest datapoint is in or in front of itThe for typical case (where
every == period
), steps 5 and 3 cancel each other out, and the "offset defaults to negative every" just makes the logic harder to teach and understand. I think that the "negative every" default was originally introduced because at first there was no step 4). Now that step 4) exists, then defaulting to-every
is no longer necessary to ensure that the earliest datapoint is included in at least one window.Impact
For the "base case" (where period == every), this makes no difference
The only case when changing the default of
offset
makes a difference is ifperiod
is greater thanevery
, or ifclosed='both'
. And in that case, I'd say that the difference is for the better. I'll show an example - suppose you start withYou might expect that the windows will be:
But actually, because
offset
defaults to "negative every", the result includes an extraat the front, and it doesn't get skipped because
period
>every
and so it contains the first datapoint.Furthermore, if we look at the failing tests, then the new output matches the originally-reported expected output (when provided):
groupby_dynamic
min
aggregation behaves unexpectedly at DataFrame tail #7548: same, the new output matches the OP's expected outputSummary
Example
Before:
After: