-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return values larger than 2 registers using a return area pointer #131211
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
I have no clue about this... r? compiler |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic seems sound, though I will leave it open for a bit in case someone else has input because I don't feel 100% confident on this (and you still have to do some FileChecking, good luck on that)
This shouldn't cause any problems as it checks the size, and if two types have different sizes they're obviously not ABI-compatible anyways.
This seems like it could be a performance pessimization in some cases, but it seems unlikely to actually affect anything. I think rustc uses u128 a bunch, and probably sometimes in Option<T>
, so if you think that it could have effects you could do a perf run I guess.
A perf run can't hurt. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Return values larger than 2 registers using a return area pointer LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer. While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway. In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example `Option<u128>` can be lowered as `{ i128, i128 }` in which case the x86_64 backend would use a return area pointer, or it could be passed as `{ i32, i128 }` in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers. This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target. Helps with rust-lang/rustc_codegen_cranelift#1525 cc bytecodealliance/wasmtime#9250
fwiw LLVM often looks at an i128, sees that the intermediate computation will not exceed 64 bits for |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (3fb9bf9): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 3.3%, secondary 1.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 773.547s -> 773.604s (0.01%) |
r? compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally reasonable to me. It's always better to specify the correct ABI on the Rust side rather than make LLVM lower it. Apart from the benefit to Cranelift, this may also enable more optimization in LLVM, e.g. by allowing copies to be optimized away (because they are now explicit, not implicit).
My only question here is whether it would make sense to try and preserve the behavior you mention where we pass the result in up to three registers rather than two? Or does Cranelift not support that variant?
LLVM only supports this on some architectures afaik and conditionally doing this depending on the architecture is harder than never doing it at all. In any case Cranelift only supports it on x86_64. Also due to the interaction of PassMode::Pair with padding, it depends on the exact way rust types are lowered to llvm types if some types would be considered 3 or 4 registers big. The LLVM and Cranelift backends currently disagree about this for some types including the aformentioned |
34210ca
to
b36fbc7
Compare
This comment has been minimized.
This comment has been minimized.
LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer. While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway. In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example `Option<u128>` can be lowered as `{ i128, i128 }` in which case the x86_64 backend would use a return area pointer, or it could be passed as `{ i32, i128 }` in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers. This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target.
b36fbc7
to
ccd1bc2
Compare
I've now tested this on i686-unknown-linux-gnu, x86_64-unknown-linux-gnu and wasm32-wasip1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the test changes look reasonable. You can r= nikic and me after PR CI is green.
@bors r=nikic,jieyouxu |
…kic,jieyouxu Return values larger than 2 registers using a return area pointer LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer. While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway. In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example `Option<u128>` can be lowered as `{ i128, i128 }` in which case the x86_64 backend would use a return area pointer, or it could be passed as `{ i32, i128 }` in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers. This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target. Helps with rust-lang/rustc_codegen_cranelift#1525 cc bytecodealliance/wasmtime#9250
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
cab206c
to
cc7044b
Compare
An assembly test needed to be updated as LLVM decided to reorder an argument load instruction. @bors r=nikic,jieyouxu |
☀️ Test successful - checks-actions |
Finished benchmarking commit (a2a1206): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (secondary 3.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 780.796s -> 782.005s (0.15%) |
LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer.
While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway.
In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example
Option<u128>
can be lowered as{ i128, i128 }
in which case the x86_64 backend would use a return area pointer, or it could be passed as{ i32, i128 }
in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers.This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target.
Helps with rust-lang/rustc_codegen_cranelift#1525
cc bytecodealliance/wasmtime#9250