-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
functions: Add lower support #3521
Conversation
Signed-off-by: Xuanwo <github@xuanwo.io>
Thanks for the contribution! Please review the labels and make any necessary changes. |
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Take the reviewer to sundy-li @PsiACE |
BTW Upper can be addressed too. |
Yep, I will add in the next PR 😄 |
@@ -38,6 +38,7 @@ hex = "0.4.3" | |||
base64 = "0.13.0" | |||
itertools = "0.10.3" | |||
num-format = "0.4" | |||
bstr = "0.2.17" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this crate for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bstr
is a byte string library. I use the ByteSlice
trait so that we don't need to implement s.char_indices()
on [u8]
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
impl StringOperator for Lower { | ||
#[inline] | ||
fn apply_with_no_null<'a>(&'a mut self, s: &'a [u8], buffer: &mut [u8]) -> usize { | ||
buffer.copy_from_slice(&s.to_lowercase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, there is an extra copy.
Maybe we can use https://docs.rs/bstr/latest/bstr/trait.ByteSlice.html#method.to_lowercase_into
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some research, I found:
to_lowercase_into
requires a&mut Vec<u8>
and we can't fit the requirement.
One possible solution is to implement the to_lowercase_into
by hand. What's your idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use the codes inside to_lowercase_into
:
#[inline]
fn to_lowercase_into(&self, buf: &mut Vec<u8>) {
// TODO: This is the best we can do given what std exposes I think.
// If we roll our own case handling, then we might be able to do this
// a bit faster. We shouldn't roll our own case handling unless we
// need to, e.g., for doing caseless matching or case folding.
// TODO(BUG): This doesn't handle any special casing rules.
buf.reserve(self.as_bytes().len());
for (s, e, ch) in self.char_indices() {
if ch == '\u{FFFD}' {
buf.push_str(&self.as_bytes()[s..e]);
} else if ch.is_ascii() {
buf.push_char(ch.to_ascii_lowercase());
} else {
for upper in ch.to_lowercase() {
buf.push_char(upper);
}
}
}
}
Codecov Report
@@ Coverage Diff @@
## main #3521 +/- ##
=====================================
- Coverage 60% 60% -1%
=====================================
Files 622 623 +1
Lines 34840 34846 +6
=====================================
- Hits 21244 21240 -4
- Misses 13596 13606 +10
Continue to review full report at Codecov.
|
CI Passed |
Oh, the pr already merged... |
Signed-off-by: Xuanwo github@xuanwo.io
I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/
Summary
Add lower support
Checklist
Changelog
Related Issues
Test Plan
Unit Tests
Stateless Tests