Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #[rustc_no_mir_inline] for standard library UB checks #121114

Merged
merged 1 commit into from
Feb 25, 2024

Conversation

Noratrieb
Copy link
Member

@Noratrieb Noratrieb commented Feb 14, 2024

should help with #121110 and also with #120848

Because the MIR inliner cannot know whether the checks are enabled or not, so inlining is an unnecessary compile time pessimization when debug assertions are disabled. LLVM knows whether they are enabled or not, so it can optimize accordingly without wasting time.

r? @saethlin

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 14, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 14, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@saethlin
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 14, 2024
@bors
Copy link
Contributor

bors commented Feb 14, 2024

⌛ Trying commit 749d7e2 with merge 90741d3...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 14, 2024
Add `#[rustc_no_mir_inline]` for standard library UB checks

should help with rust-lang#121110 and also with rust-lang#120848

I am not entirely sure whether this is the correct solution and I haven't validated it, I just quickly threw it together before going to sleep.

r? `@saethlin`
@bors
Copy link
Contributor

bors commented Feb 14, 2024

☀️ Try build successful - checks-actions
Build commit: 90741d3 (90741d31a5aa2901d7f92a68a87c1ff5a32b3afa)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (90741d3): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.2%, 1.3%] 16
Regressions ❌
(secondary)
2.6% [0.5%, 5.3%] 8
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [-0.3%, 1.3%] 17

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.7%, 2.6%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-6.0% [-18.2%, -1.1%] 4
Improvements ✅
(secondary)
-5.4% [-5.4%, -5.4%] 1
All ❌✅ (primary) -2.7% [-18.2%, 2.6%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [1.3%, 1.6%] 2
Regressions ❌
(secondary)
3.7% [2.4%, 6.0%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.4% [1.3%, 1.6%] 2

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.6% [0.0%, 2.2%] 71
Regressions ❌
(secondary)
2.2% [0.0%, 4.3%] 18
Improvements ✅
(primary)
-0.2% [-0.3%, -0.0%] 6
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.5% [-0.3%, 2.2%] 77

Bootstrap: 636.575s -> 636.125s (-0.07%)
Artifact size: 306.16 MiB -> 306.10 MiB (-0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 15, 2024
@saethlin
Copy link
Member

saethlin commented Feb 15, 2024

Usually what I do at this point is look at the LLVM IR that's generated for the most-regressed case and see which precondition check dominates.

Generally I suspect slice::from_raw_parts. That function and its _mut version are used all over and the actual check implementation is about twice as big as it needs to be I think because the current implementation handles the case of invalid alignment. So assuming that it's slice::from_raw_parts I have two ideas:

  • Write the slice::from_raw_parts precondition in a way that dodges the extra code; ptr::is_aligned_to incurs checking from Alignment::new, and ptr.addr() % alignment incurs a check for alignment == 0. Just doing & (align - 1) == 0 is probably appropriate here?
  • Replace the use of that function in Vec::deref with &* ptr::slice_from_raw_parts. The precondition check in there should be redundant anyway, unless someone manages to smuggle an invalid Vec via transmute.

We should probably do both eventually on their own merits, but it might be useful to perf them separately.

@saethlin saethlin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 15, 2024
Noratrieb added a commit to Noratrieb/rust that referenced this pull request Feb 20, 2024
…aethlin

Always inline check in `assert_unsafe_precondition` with cfg(debug_assertions)

The current complexities in `assert_unsafe_precondition` are delicately balancing several concerns, among them compile times for the cases where there are no debug assertions. This comes at a large runtime cost when the assertions are enabled, making the debug assertion compiler a lot slower, which is very annoying.

To avoid this, we always inline the check when building with debug assertions.

Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)` `debug_assertions` (equivalent to a bootstrap bump (uhh, i just realized that i was on a slightly outdated master so this bump might have happened already), (rust-lang#121112)): 67s
- this: 54s

So this seems like a good solution. I think we can still get the same run-time perf improvements for other users too by massaging this code further (see my other PR about adding `#[rustc_no_mir_inline]` rust-lang#121114) but this is a simpler step that solves the imminent problem of "holy shit my rustc is sooo slow".

Funny consequence: This now means compiling the standard library with dbeug assertions makes it faster (than without, when using debug assertions downstream)!

r? `@saethlin` (or anyone else if someone wants to review this)

fixes rust-lang#121110, supposedly
Noratrieb added a commit to Noratrieb/rust that referenced this pull request Feb 20, 2024
…aethlin

Always inline check in `assert_unsafe_precondition` with cfg(debug_assertions)

The current complexities in `assert_unsafe_precondition` are delicately balancing several concerns, among them compile times for the cases where there are no debug assertions. This comes at a large runtime cost when the assertions are enabled, making the debug assertion compiler a lot slower, which is very annoying.

To avoid this, we always inline the check when building with debug assertions.

Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)` `debug_assertions` (equivalent to a bootstrap bump (uhh, i just realized that i was on a slightly outdated master so this bump might have happened already), (rust-lang#121112)): 67s
- this: 54s

So this seems like a good solution. I think we can still get the same run-time perf improvements for other users too by massaging this code further (see my other PR about adding `#[rustc_no_mir_inline]` rust-lang#121114) but this is a simpler step that solves the imminent problem of "holy shit my rustc is sooo slow".

Funny consequence: This now means compiling the standard library with dbeug assertions makes it faster (than without, when using debug assertions downstream)!

r? ``@saethlin`` (or anyone else if someone wants to review this)

fixes rust-lang#121110, supposedly
Noratrieb added a commit to Noratrieb/rust that referenced this pull request Feb 20, 2024
…aethlin

Always inline check in `assert_unsafe_precondition` with cfg(debug_assertions)

The current complexities in `assert_unsafe_precondition` are delicately balancing several concerns, among them compile times for the cases where there are no debug assertions. This comes at a large runtime cost when the assertions are enabled, making the debug assertion compiler a lot slower, which is very annoying.

To avoid this, we always inline the check when building with debug assertions.

Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)` `debug_assertions` (equivalent to a bootstrap bump (uhh, i just realized that i was on a slightly outdated master so this bump might have happened already), (rust-lang#121112)): 67s
- this: 54s

So this seems like a good solution. I think we can still get the same run-time perf improvements for other users too by massaging this code further (see my other PR about adding `#[rustc_no_mir_inline]` rust-lang#121114) but this is a simpler step that solves the imminent problem of "holy shit my rustc is sooo slow".

Funny consequence: This now means compiling the standard library with dbeug assertions makes it faster (than without, when using debug assertions downstream)!

r? ```@saethlin``` (or anyone else if someone wants to review this)

fixes rust-lang#121110, supposedly
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Feb 20, 2024
Rollup merge of rust-lang#121196 - Nilstrieb:the-clever-solution, r=saethlin

Always inline check in `assert_unsafe_precondition` with cfg(debug_assertions)

The current complexities in `assert_unsafe_precondition` are delicately balancing several concerns, among them compile times for the cases where there are no debug assertions. This comes at a large runtime cost when the assertions are enabled, making the debug assertion compiler a lot slower, which is very annoying.

To avoid this, we always inline the check when building with debug assertions.

Numbers (compiling stage1 library after touching core):
- master: 80s
- just adding `#[inline(always)]` to the `cfg(bootstrap)` `debug_assertions` (equivalent to a bootstrap bump (uhh, i just realized that i was on a slightly outdated master so this bump might have happened already), (rust-lang#121112)): 67s
- this: 54s

So this seems like a good solution. I think we can still get the same run-time perf improvements for other users too by massaging this code further (see my other PR about adding `#[rustc_no_mir_inline]` rust-lang#121114) but this is a simpler step that solves the imminent problem of "holy shit my rustc is sooo slow".

Funny consequence: This now means compiling the standard library with dbeug assertions makes it faster (than without, when using debug assertions downstream)!

r? ```@saethlin``` (or anyone else if someone wants to review this)

fixes rust-lang#121110, supposedly
@scottmcm
Copy link
Member

I'd love to have #[rustc_no_mir_inline], regardless of whether using it in these specific cases here ends up working.

For example, in

// Introducing a function boundary here means that the two halves
// get `noalias` markers, allowing better optimization as LLVM
// knows that they're disjoint, unlike in the original slice.
revswap(front_half, back_half, half_len);
#[inline]
fn revswap<T>(a: &mut [T], b: &mut [T], n: usize) {
it's very useful to preserve the function boundary for LLVM to see noalias, thus mir-inlining it is counter-productive, but it's also important that it be #[inline] especially when it's called on arrays.

@Noratrieb
Copy link
Member Author

I need to write down why this is a good idea and necessary but to remind myself: the reason is that mir inlining doesn't see through debug_assertions.

@Noratrieb
Copy link
Member Author

I think we should merge this despite the regressions and count the regressions as part of this unsafe precondition checking and work on that instead of blocking this PR which is a significant improvement for users.
trying out removing the check from vec deref anyways (though thats probably more significant for runtime than compile time)
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@saethlin
Copy link
Member

In addition to that, we should recoup some of these regressions by improving our lowering of if false and if true: #120650 #121421

Can you add a mir-opt test that demonstrates that this prevents MIR inlining?

@Noratrieb
Copy link
Member Author

I have added a test, and confirmed that the test fails properly when removing the attribute from the callee (never trust a test you didn't see fail etc etc).

@saethlin
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Feb 24, 2024

📌 Commit 7e10cc5 has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 24, 2024
@rust-log-analyzer

This comment has been minimized.

@Noratrieb
Copy link
Member Author

fuck
@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 24, 2024
Co-authored-by: Ben Kimock <kimockb@gmail.com>
@Noratrieb
Copy link
Member Author

@bors r=saethlin

@bors
Copy link
Contributor

bors commented Feb 24, 2024

📌 Commit 81d7069 has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 24, 2024
@bors
Copy link
Contributor

bors commented Feb 25, 2024

⌛ Testing commit 81d7069 with merge e9f9594...

@bors
Copy link
Contributor

bors commented Feb 25, 2024

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing e9f9594 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 25, 2024
@bors bors merged commit e9f9594 into rust-lang:master Feb 25, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Feb 25, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e9f9594): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.2%, 1.2%] 33
Regressions ❌
(secondary)
1.5% [0.3%, 5.5%] 12
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 3
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.2%] 8
All ❌✅ (primary) 0.4% [-0.3%, 1.2%] 36

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.7% [0.9%, 4.5%] 5
Regressions ❌
(secondary)
1.5% [1.5%, 1.5%] 1
Improvements ✅
(primary)
-9.5% [-13.2%, -5.8%] 2
Improvements ✅
(secondary)
-4.5% [-6.3%, -1.1%] 4
All ❌✅ (primary) -0.8% [-13.2%, 4.5%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
3.6% [2.5%, 6.2%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.1% [-1.1%, -1.1%] 1
All ❌✅ (primary) 1.0% [1.0%, 1.0%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.1%, 2.3%] 61
Regressions ❌
(secondary)
2.1% [0.1%, 4.4%] 18
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 15
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.6% [-0.2%, 2.3%] 76

Bootstrap: 651.95s -> 651.212s (-0.11%)
Artifact size: 311.05 MiB -> 311.01 MiB (-0.01%)

@Noratrieb
Copy link
Member Author

That's more regressions than the previous check showed. Still, my previous analysis should apply. #121421 should get some of it back as well

@RalfJung RalfJung mentioned this pull request Feb 25, 2024
bors added a commit to rust-lang/miri that referenced this pull request Feb 25, 2024
Rustup

Let's see if rust-lang/rust#121114 gets perf back to the old level.
@apiraino
Copy link
Contributor

apiraino commented Feb 29, 2024

fixes #121245

@Noratrieb Noratrieb deleted the no-inline! branch February 29, 2024 16:18
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Mar 3, 2024
Rustup

Let's see if rust-lang#121114 gets perf back to the old level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants