Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non memcmp slice comparison optimization #113654

Closed
wants to merge 2 commits into from

Conversation

krtab
Copy link
Contributor

@krtab krtab commented Jul 13, 2023

This PR:

  1. Changes the implementation for [T] slice comparison when T: !BytewiseEq. The previous implementation used zip which is not as optimized as it could be. Performance improvements are for example 20% when testing that [Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice().
  2. On musl (and maybe others), memcmp is equivalent to the new code added in the previous point. On such platforms it is better to never call memcmp because the compiler can leverage T's alignment to generate better code than for memcmp's *const u8.

I only realized when running ./x.py test that platform specific code is not allowed in core. However, I would argue that this may be acceptable because:

  • It does not affect portability so much, both implementations work fine, and this is only a matter of performance
  • Deferring to libc's memcmp on the ground that it is more optimized than our generic one is already a kind of platform specific assumption. Arguably, rather than a cfg(not() making it opt-out, using memcmp should be opt-in.

This is likely more a draft than something ready to merge given the discussions that it must go through, but here is to starting these discussions.

The previous implementation was not optimized properly by the compiler,
which didn't leverage the fact that both length were equal.
@rustbot
Copy link
Collaborator

rustbot commented Jul 13, 2023

r? @joshtriplett

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 13, 2023
@krtab
Copy link
Contributor Author

krtab commented Jul 13, 2023

This is loosely related to #113576 so I'm cross linking these two issues for the record.

@krtab krtab changed the title Non memcmp slice comparison otpimization Non memcmp slice comparison optimization Jul 13, 2023
@krtab krtab force-pushed the slice_compare_no_memcmp_musl branch from 0659f11 to f566552 Compare July 13, 2023 12:26
Musl implementation is equivalent to our general case implementation.
Not using it:
 - Allows for better optimization leveraging alignment information
 - Prevents a `call`
 - Likely favors inlining and further optimization
return false;
}
// and `memcmp` is not equivalent to our generic case
#[cfg(not(target_env = "musl"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that cfg needs a clear comment (saying that musl's memcmp is not more optimized)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for reviewing.
I'm planning on commenting more here and in pal.rs but I'm waiting to see how the debate on cfg in core goes.

@Noratrieb
Copy link
Member

Let's run compiler benchmarks here too @bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 13, 2023
@bors
Copy link
Contributor

bors commented Jul 13, 2023

⌛ Trying commit f566552 with merge 982256677362bea322fd9fda00d76e80e2d72534...

@bors
Copy link
Contributor

bors commented Jul 13, 2023

☀️ Try build successful - checks-actions
Build commit: 982256677362bea322fd9fda00d76e80e2d72534 (982256677362bea322fd9fda00d76e80e2d72534)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (982256677362bea322fd9fda00d76e80e2d72534): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.6%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.9% [-1.5%, -0.5%] 17
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.7% [-1.5%, 0.6%] 20

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [1.1%, 1.8%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-2.4%, -2.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-2.4%, 1.8%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.9% [-5.0%, -1.2%] 10
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.9% [-5.0%, -1.2%] 10

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.3%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.5% [-2.2%, -0.0%] 28
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-2.2%, 0.3%] 31

Bootstrap: 658.224s -> 655.457s (-0.42%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 13, 2023
@Noratrieb
Copy link
Member

the bitmaps benchmark has been really noisy recently so at least half these improvements may not actually be improvements

@krtab
Copy link
Contributor Author

krtab commented Jul 20, 2023

I am willing to provide more benchmarks, especially on musl if needed, but I'd like to be sure that this won't be turned down on the ground of "no arch specific code in core" and before any performance consideration before doing so. Let me know what you feel is lacking for this to move forward.

@dtolnay dtolnay removed the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Sep 27, 2023
@krtab
Copy link
Contributor Author

krtab commented Oct 17, 2023

#114382 makes the musl part of this PR irrelevant. To clarify the process, the T: !BytewiseEq is moved to its own PR: #116846.

@krtab krtab closed this Oct 17, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 5, 2023
Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 7, 2023
…bilee

Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 7, 2023
…bilee

Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 8, 2023
…bilee

Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 15, 2023
…bilee

Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 16, 2023
…bilee

Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 5, 2024
…=<try>

A more efficient slice comparison implementation for T: !BytewiseEq

(This is a follow up PR on rust-lang#113654)

This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 9, 2024
…=the8472

A more efficient slice comparison implementation for T: !BytewiseEq

(This is a follow up PR on rust-lang#113654)

This PR changes the implementation for `[T]` slice comparison when `T: !BytewiseEq`. The previous implementation using zip was not optimized properly by the compiler, which didn't leverage the fact that both length were equal. Performance improvements are for example 20% when testing that `[Some(0_u64); 4096].as_slice() == [Some(0_u64); 4096].as_slice()`.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 1, 2024
…bilee

Delete architecture-specific memchr code in std::sys

Currently all architecture-specific memchr code is only used in `std::io`. Most of the actual `memchr` capacity exposed to the user through the slice API is instead implemented in `core::slice::memchr`.

Hence this commit deletes `memchr` from `std::sys[_common]` and replace calls to it by calls to `core::slice::memchr` functions. This deletes `(r)memchr` from the list of symbols linked to libc.

The interest of putting architecture specific code back in core is linked to the discussion to be had in rust-lang#113654
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants