-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non memcmp slice comparison optimization #113654
Conversation
The previous implementation was not optimized properly by the compiler, which didn't leverage the fact that both length were equal.
(rustbot has picked a reviewer for you, use r? to override) |
This is loosely related to #113576 so I'm cross linking these two issues for the record. |
0659f11
to
f566552
Compare
Musl implementation is equivalent to our general case implementation. Not using it: - Allows for better optimization leveraging alignment information - Prevents a `call` - Likely favors inlining and further optimization
return false; | ||
} | ||
// and `memcmp` is not equivalent to our generic case | ||
#[cfg(not(target_env = "musl"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that cfg needs a clear comment (saying that musl's memcmp is not more optimized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for reviewing.
I'm planning on commenting more here and in pal.rs
but I'm waiting to see how the debate on cfg in core
goes.
Let's run compiler benchmarks here too @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit f566552 with merge 982256677362bea322fd9fda00d76e80e2d72534... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (982256677362bea322fd9fda00d76e80e2d72534): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 658.224s -> 655.457s (-0.42%) |
the bitmaps benchmark has been really noisy recently so at least half these improvements may not actually be improvements |
I am willing to provide more benchmarks, especially on musl if needed, but I'd like to be sure that this won't be turned down on the ground of "no arch specific code in core" and before any performance consideration before doing so. Let me know what you feel is lacking for this to move forward. |
Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
…bilee Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
…bilee Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
…bilee Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
…bilee Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
…bilee Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
…=<try> A more efficient slice comparison implementation for T: !BytewiseEq (This is a follow up PR on rust-lang#113654) This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
…=the8472 A more efficient slice comparison implementation for T: !BytewiseEq (This is a follow up PR on rust-lang#113654) This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
…bilee Delete architecture-specific memchr code in std::sys Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`. Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc. The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
This PR:
[T]
slice comparison whenT: !BytewiseEq
. The previous implementation usedzip
which is not as optimized as it could be. Performance improvements are for example 20% when testing that[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()
.memcmp
is equivalent to the new code added in the previous point. On such platforms it is better to never callmemcmp
because the compiler can leverageT
's alignment to generate better code than formemcmp
's*const u8
.I only realized when running
./x.py test
that platform specific code is not allowed incore
. However, I would argue that this may be acceptable because:memcmp
on the ground that it is more optimized than our generic one is already a kind of platform specific assumption. Arguably, rather than acfg(not()
making it opt-out, usingmemcmp
should be opt-in.This is likely more a draft than something ready to merge given the discussions that it must go through, but here is to starting these discussions.