-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust insufficiently optimizes loop { match { } } state machines #80630
Comments
Atleast that part happens once you add more states. |
Didn't look too deeply into it but the IR fed to LLVM around the loop-match construct is not that different between rustc and clang. However with clang it becomes a switchtable at llvm-ir level, and we rely on the codegen backend to do so with rust https://godbolt.org/z/9Erv94 FWIW from what I can tell the additional comparisons can be blamed on |
I thought we already had an optimization for this
just looking locally at this, the |
AFAICT this is optimized with #80475 |
LLVM can optimize this with the |
That PR is merged now. Did that fix this issue? |
Testing it with (2021-03-20 61edfd5) on the playground shows no improvement. |
The playground does not actually run mir optimizations of level 3 or above, so you would need to run this locally with nightly and use -Zmir-opt-level=3
This is worrysome. Our optimization as written won't do anything fancy, so it will make loops more complex. The perf results on that optimization are mostly in the noise (ignoring the CTFE stress tests), though without a dedicated compile-time and runtime test, it's hard to tell. Considering that the compiler bootstrap did not regress, I don't think the opt messes with anything too badly. |
Or you can use Godbolt: https://rust.godbolt.org/z/doK6W9jPT (I don't what the Assembly is supposed to look like so I can't tell if this is fixed or not.) |
Unfortunately it still doesn't seem fixed. Assembly from godbolt link:
I've annotated it a bit with |
Maybe LLVM "de-optimizes" our MIR again. Not sure, we'll have to look at the MIR before LLVM, at LLVM-IR without opts and at LLVM-IR with opts to be sure. I'm not sure how to coax godbolt into doing MIR dumps though. |
Add |
The same can be seen in MIR too:
After seeing the MIR I thought maybe the issue was the extra "suffix" block after each print operation, so i switcher around assignment and print (instead of doing
|
I just looked at the optimization itself. I do not think the issue is related to the |
A new optimization pass is currently in review upstream: https://reviews.llvm.org/D99205 |
With new DFA jump threading pass: https://rust.godbolt.org/z/v1a4c689E |
Optimization breaks if you mark EDIT: It only seems to work, when LLVM can infer the initial value/state of the state machine, which doesn't happen often in practice when dealing with state machines. Marking |
I have not looked into this test case (just saw this issue today), but the limitation that you have mentioned here is not hard to fix. We will look into it and get back to you |
Responding to a feature request from the Rust community: rust-lang/rust#80630 void foo(X) { for (...) switch (X) case A X = B case B X = C } Even though the initial switch value is non-constant, the switch statement can still be threaded: the initial value will hit the switch statement but the rest of the state changes will proceed by jumping unconditionally. The early predictability check is relaxed to allow unpredictable values anywhere, but later, after the paths through the switch statement have been enumerated, no non-constant state values are allowed along the paths. Any state value not along a path will be an initial switch value, which can be safely ignored. Differential Revision: https://reviews.llvm.org/D124394
Responding to a feature request from the Rust community: rust-lang/rust#80630 void foo(X) { for (...) switch (X) case A X = B case B X = C } Even though the initial switch value is non-constant, the switch statement can still be threaded: the initial value will hit the switch statement but the rest of the state changes will proceed by jumping unconditionally. The early predictability check is relaxed to allow unpredictable values anywhere, but later, after the paths through the switch statement have been enumerated, no non-constant state values are allowed along the paths. Any state value not along a path will be an initial switch value, which can be safely ignored. Differential Revision: https://reviews.llvm.org/D124394
@bryanpkc This looks like exactly what I need. |
This is how a typical state machine might look in Rust:
One would expect that the first jump from function start into one of the states (0, 1, 2 or _) would be compiled as a jumptable or a series of cmp/conditional jump instructions and once the execution is in one of the states the remaining jumps would be deterministic (non-conditional) jumps.
And this is how equivalent code in gcc is optimized.
However Rust/LLVM at the end of each state performs needless comparisons to determine to which label to jump to next, even though the jump target can be computed statically. You can inspect the generated assembly in Rust Playground.
This is probably more of a LLVM issue rather than Rust directly, but it's important to get this fixed/optimized, because now it's impossible to efficiently write such state machines in Rust (because there is no
goto
)Meta
Tested with all versions available in Rust playground:
The text was updated successfully, but these errors were encountered: