-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Expr.interpolate_by #16313
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16313 +/- ##
==========================================
+ Coverage 81.37% 81.39% +0.02%
==========================================
Files 1403 1405 +2
Lines 183729 183996 +267
Branches 2954 2954
==========================================
+ Hits 149501 149763 +262
- Misses 33717 33722 +5
Partials 511 511 ☔ View full report in Codecov by Sentry. |
0a85fdf
to
c47b24b
Compare
c47b24b
to
aa45ec5
Compare
#[cfg(feature = "interpolate")] | ||
mod interpolate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving this out of chunked_array
and into series
, as the public function operates on (and returns) a Series
Does data need to be sorted? We could interpolate any arbitrary function right? 🤔 |
For linear interpolation based on another column, I'd say it needs to be sorted - if you consider the image I posted here, then you'd expect to get those particular interpolated points, regardless of the order you presented the data in All I meant was that, as with |
Oh, yes. Make sense. The |
} | ||
} | ||
|
||
pub fn interpolate_by(s: &Series, by: &Series, assume_sorted: bool) -> PolarsResult<Series> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter can be removed and we can use the is_sorted
check.
|
||
// Fill out with first. | ||
let mut out = Vec::with_capacity(chunked_arr.len()); | ||
let mut iter = chunked_arr.into_iter().enumerate().skip(first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunked_arr.iter
is faster as it doesn't box the allocator.
|
||
let mut low_val = None; | ||
let mut low_idx = None; | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loop
match next() -> None -> break
can be written as:
while let Some(val) = iter.next() {
}
low_idx = Some(idx); | ||
}, | ||
Some((_idx, None)) => { | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while let ...
let mut validity = MutableBitmap::with_capacity(chunked_arr.len()); | ||
validity.extend_constant(chunked_arr.len(), true); | ||
|
||
for i in 0..first { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe do these without bound checks if we can have cases where first
is large.
interpolation_branch( | ||
low_val.expect("We started iterating at `first`"), | ||
high, | ||
&by_values[low_idx.unwrap()..=high_idx], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be unchecked
, it is in the hot loop.
let last = ca_sorted.last_non_null().unwrap() + 1; | ||
|
||
let mut out = zeroed_vec(ca_sorted.len()); | ||
let mut iter = ca_sorted.into_iter().enumerate().skip(first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ca.iter()
is the faster variant.
|
||
let mut low_val = None; | ||
let mut low_idx = None; | ||
loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while let..
let mut out = zeroed_vec(ca_sorted.len()); | ||
let mut iter = ca_sorted.into_iter().enumerate().skip(first); | ||
|
||
let mut low_val = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
validity.extend_constant(ca_sorted.len(), true); | ||
|
||
for i in 0..first { | ||
let out_idx = unsafe { sorting_indices.get_unchecked(i) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same..
b3456f8
to
f4d07a9
Compare
closes #9616
This implements linear interpolation based on another column
It doesn't not assume the data to be sorted by
by
(i.e. it already sorts on the user's behalf)Follow-ups:
by
column (this needs doing forrolling_*_by
too)