-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the default branch with an unreachable branch If it is the last variant #120268
Replace the default branch with an unreachable branch If it is the last variant #120268
Conversation
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…tchs, r=<try> Replace the default branch with an unreachable branch If it is the last variant Fixes rust-lang#119520. LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446. The main reasons are as follows: - Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately. - Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8 - The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870). - The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK. Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values. Note that we've currently found a slow compilation problem in the presence of unreachable branches. See llvm/llvm-project#78578. r? compiler
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f893d88): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 661.891s -> 662.949s (0.16%) |
Test4::C => "C", | ||
_ => "D", | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still work if Test4
holds a generic type instead of an i32
? Should it be made so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this will be split into multiple switchInt
. So I expect generic type to get the same result. But the test case is not happening, so I should be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added support for generic type.
5617d16
to
5a398d3
Compare
This comment has been minimized.
This comment has been minimized.
Based on the discussion in zulipchat, I have changed |
This comment has been minimized.
This comment has been minimized.
af2420c
to
d02299c
Compare
@bors r+ rollup=never |
Hmm, I just remembered a possible regression issue, but I don't think that should affect merging this PR. Because we're on the road to a better outcome. #![crate_type = "lib"]
pub enum Bar {
Foo = 1,
Bar = 2,
Baz = 3
}
#[no_mangle]
pub fn lookup(v: Bar) -> i32 {
match v {
Bar::Foo => 8,
Bar::Bar => 9,
Bar::Baz => 3,
}
}
#[no_mangle]
pub fn compare(v: Bar) -> i32 {
match v {
Bar::Foo => 8,
Bar::Bar => 9,
_ => 3,
}
}
#[no_mangle]
pub fn lookup2(v: Bar) -> i32 {
match v {
Bar::Foo => 1,
Bar::Bar => 2,
Bar::Baz => 3,
}
}
#[no_mangle]
pub fn compare2(v: Bar) -> i32 {
match v {
Bar::Foo => 1,
Bar::Bar => 2,
_ => 3,
}
} GodBolt: https://rust.godbolt.org/z/55dq6zbjd The |
…tchs, r=oli-obk Replace the default branch with an unreachable branch If it is the last variant Fixes rust-lang#119520. LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446. The main reasons are as follows: - Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately. - Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8 - The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870). - The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK. Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values. Note that we've currently found a slow compilation problem in the presence of unreachable branches. See llvm/llvm-project#78578. r? compiler
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
@bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no uninhabited enum anywhere in this test... how does the test filename make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case came directly from the issue it fixed. It will call partial_cmp
, so it's essentially the same as #119520 (comment) . I think it makes sense to add the test code in the issue, maybe I should create two test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But again there's no uninhabited enums anywhere that I can see, so (a) what does the test content have to do with the filename, and (b) what does it have to do with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code a bit more, I think the MIR pass (and associated test) are misnamed. This is no longer just about uninhabited variants, it is now also about exploiting that Discriminant
will never return something that isn't a variant index. I am a bit surprised that this is done as a MIR transform rather than during MIR building but the MIR transform is correct according to our current understanding of MIR semantics. Just the name is misleading after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) I can change the file name to issue-119520.rs
.
(b) https://rust.godbolt.org/z/za5c5hzoY When I reduce the issue's case, I found out that it uses Ordering
after inlining. It implies the enum. I also hope that this test case will not lose optimization due to other changes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'm considering updating the name. (It’s just that I didn’t think of a suitable name.)
Maybe I can change it to UnreachableEnumBranching
.
I am a bit surprised that this is done as a MIR transform rather than during MIR building but the MIR transform is correct according to our current understanding of MIR semantics. Just the name is misleading after this PR.
It better for me to have MIR building match the structure of the code itself where possible. (This purpose may not matter either?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It better for me to have MIR building match the structure of the code itself where possible. (This purpose may not matter either?)
Ah, I may have misunderstood where this optimization kicks in. I thought even this would just use fallback for the last variant:
match c {
Less => -5,
Equal => 0,
Greater => 42,
}
But already on stable that becomes switchInt(move _2) -> [255: bb3, 0: bb4, 1: bb1, otherwise: bb2]
.
I can change the file name to issue-119520.rs.
Once we have a better name for the pass, it can use that name. (Though it would also be good to mention the issue either in the file name or file contents. It's always good to add more cross-references and those are otherwise much harder to reconstruct in the future.)
Maybe I can change it to UnreachableEnumBranching.
I like it. :) The module-level comment in that file should then explain the two ways that "unreachable" is determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I may have misunderstood where this optimization kicks in. I thought even this would just use fallback for the last variant:
match c { Less => -5, Equal => 0, Greater => 42, }But already on stable that becomes
switchInt(move _2) -> [255: bb3, 0: bb4, 1: bb1, otherwise: bb2]
.
This is something UninhabitedEnumBranching
has already done.
This PR transforms following codes
match c {
Less => -5,
Equal => 0,
_ => 42,
}
to
match c {
Less => -5,
Equal => 0,
Greater => 42,
}
.
&& allowed_variants.len() == 1 | ||
&& check_successors(&body.basic_blocks, targets.otherwise()); | ||
let replace_otherwise_to_unreachable = otherwise_is_last_variant | ||
|| !otherwise_is_empty_unreachable && allowed_variants.is_empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a few more comments explaining what happens here -- why all these checks are needed and why they are combined in exactly the way they are. Imagine someone reading this code in a year without knowing about this PR -- what would they have to know to make sense of all this? For instance, what is check_successors
even checking?
Also, a || b && c
could use parentheses, the precedence is currently unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, testing has started, r-
or I'll add a PR subsequently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subsequent PR is fine.
☀️ Test successful - checks-actions |
Finished benchmarking commit (14fbc3c): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 646.506s -> 648.483s (0.31%) |
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` Per [rust-lang#120268](rust-lang#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` . I solved some nits to add some comments. I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead. r? RalfJung
Visiting for weekly performance triage. @DianQK the 1.8% regression to cargo opt-full is concerning to me. But from looking at the early rust-timer invocations, I saw it come up only once, not every time. So its not clear to me how much it was anticipated in your developments here. Do you have any idea where the cargo opt-full regression is arising? Is it somehow connected to llvm/llvm-project#78578 ? (not marking as triaged, not yet.) |
[perf test] rust-lang#120268 r? ghost
I don't think this is a regression. This is the result of previous try: Or maybe I'm missing something. |
I don't think it's relevant. I should see the output size increase because |
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` Per [rust-lang#120268](rust-lang#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` . I solved some nits to add some comments. I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead. r? RalfJung
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` Per [#120268](rust-lang/rust#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` . I solved some nits to add some comments. I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead. r? RalfJung
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` Per [#120268](rust-lang/rust#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` . I solved some nits to add some comments. I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead. r? RalfJung
Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` Per [#120268](rust-lang/rust#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` . I solved some nits to add some comments. I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead. r? RalfJung
Fixes #119520. Fixes #110097.
LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.
The main reasons are as follows:
-1, 0, 1
. See SimplifyCFG.cpp#L5424 and https://llvm.godbolt.org/z/qYMqhvMa8Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.
Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.
r? compiler