Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return values larger than 2 registers using a return area pointer #131211

Merged
merged 3 commits into from
Oct 19, 2024

Conversation

bjorn3
Copy link
Member

@bjorn3 bjorn3 commented Oct 3, 2024

LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer.

While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway.

In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example Option<u128> can be lowered as { i128, i128 } in which case the x86_64 backend would use a return area pointer, or it could be passed as { i32, i128 } in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers.

This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target.

Helps with rust-lang/rustc_codegen_cranelift#1525
cc bytecodealliance/wasmtime#9250

@bjorn3 bjorn3 added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-cranelift Things relevant to the [future] cranelift backend A-ABI Area: Concerning the application binary interface (ABI) labels Oct 3, 2024
@rustbot

This comment was marked as outdated.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 3, 2024
@jieyouxu
Copy link
Member

jieyouxu commented Oct 3, 2024

I have no clue about this... r? compiler

@rustbot rustbot assigned petrochenkov and unassigned jieyouxu Oct 3, 2024
@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@Noratrieb Noratrieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic seems sound, though I will leave it open for a bit in case someone else has input because I don't feel 100% confident on this (and you still have to do some FileChecking, good luck on that)

This shouldn't cause any problems as it checks the size, and if two types have different sizes they're obviously not ABI-compatible anyways.
This seems like it could be a performance pessimization in some cases, but it seems unlikely to actually affect anything. I think rustc uses u128 a bunch, and probably sometimes in Option<T>, so if you think that it could have effects you could do a perf run I guess.

@bjorn3
Copy link
Member Author

bjorn3 commented Oct 3, 2024

A perf run can't hurt.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 3, 2024
Return values larger than 2 registers using a return area pointer

LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer.

While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway.

In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example `Option<u128>` can be lowered as `{ i128, i128 }` in which case the x86_64 backend would use a return area pointer, or it could be passed as `{ i32, i128 }` in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers.

This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target.

Helps with rust-lang/rustc_codegen_cranelift#1525
cc bytecodealliance/wasmtime#9250
@bors
Copy link
Contributor

bors commented Oct 3, 2024

⌛ Trying commit 34210ca with merge 3fb9bf9...

@workingjubilee
Copy link
Member

fwiw LLVM often looks at an i128, sees that the intermediate computation will not exceed 64 bits for $REASONS, and then folds it into a 64-bit computation, in an actual optimized code section. this has been true since a long time ago. not that we shouldn't check the perf, obviously.

@bors
Copy link
Contributor

bors commented Oct 3, 2024

☀️ Try build successful - checks-actions
Build commit: 3fb9bf9 (3fb9bf9f57bf3271f7203a8baf759edd81c1a8c8)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3fb9bf9): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 3.3%, secondary 1.9%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.3% [3.3%, 3.3%] 1
Regressions ❌
(secondary)
2.9% [2.1%, 4.6%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.0% [-2.0%, -2.0%] 1
All ❌✅ (primary) 3.3% [3.3%, 3.3%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 773.547s -> 773.604s (0.01%)
Artifact size: 342.00 MiB -> 341.98 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 3, 2024
@petrochenkov
Copy link
Contributor

r? compiler
cc @nikic maybe

@rustbot rustbot assigned TaKO8Ki and unassigned petrochenkov Oct 4, 2024
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally reasonable to me. It's always better to specify the correct ABI on the Rust side rather than make LLVM lower it. Apart from the benefit to Cranelift, this may also enable more optimization in LLVM, e.g. by allowing copies to be optimized away (because they are now explicit, not implicit).

My only question here is whether it would make sense to try and preserve the behavior you mention where we pass the result in up to three registers rather than two? Or does Cranelift not support that variant?

@bjorn3
Copy link
Member Author

bjorn3 commented Oct 10, 2024

My only question here is whether it would make sense to try and preserve the behavior you mention where we pass the result in up to three registers rather than two? Or does Cranelift not support that variant?

LLVM only supports this on some architectures afaik and conditionally doing this depending on the architecture is harder than never doing it at all. In any case Cranelift only supports it on x86_64. Also due to the interaction of PassMode::Pair with padding, it depends on the exact way rust types are lowered to llvm types if some types would be considered 3 or 4 registers big. The LLVM and Cranelift backends currently disagree about this for some types including the aformentioned Option<u128>. The LLVM backend lowers it as { i32, i128 } without padding which fits in 3 registers, while the Cranelift backend takes the total type size including padding (32 bytes) which requires 4 registers.

@bjorn3 bjorn3 force-pushed the rust_abi_follow_c_rules branch from 34210ca to b36fbc7 Compare October 10, 2024 14:03
@rust-log-analyzer

This comment has been minimized.

LLVM and Cranelift disagree about how to return values that don't fit
in the registers designated for return values. LLVM will force the
entire return value to be passed by return area pointer, while
Cranelift will look at each IR level return value independently and
decide to pass it in a register or not, which would result in the
return value being passed partially in registers and partially through
a return area pointer.

While Cranelift may need to be fixed as the LLVM behavior is generally
more correct with respect to the surface language, forcing this
behavior in rustc itself makes it easier for other backends to conform
to the Rust ABI and for the C ABI rustc already handles this behavior
anyway.

In addition LLVM's decision to pass the return value in registers or
using a return area pointer depends on how exactly the return type is
lowered to an LLVM IR type. For example `Option<u128>` can be lowered
as `{ i128, i128 }` in which case the x86_64 backend would use a return
area pointer, or it could be passed as `{ i32, i128 }` in which case
the x86_64 backend would pass it in registers by taking advantage of an
LLVM ABI extension that allows using 3 registers for the x86_64 sysv
call conv rather than the officially specified 2 registers.

This adjustment is only necessary for the Rust ABI as for other ABI's
the calling convention implementations in rustc_target already ensure
any return value which doesn't fit in the available amount of return
registers is passed in the right way for the current target.
@bjorn3 bjorn3 force-pushed the rust_abi_follow_c_rules branch from b36fbc7 to ccd1bc2 Compare October 10, 2024 14:24
@bors bors added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 14, 2024
@bjorn3
Copy link
Member Author

bjorn3 commented Oct 17, 2024

I've now tested this on i686-unknown-linux-gnu, x86_64-unknown-linux-gnu and wasm32-wasip1.

Copy link
Member

@jieyouxu jieyouxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the test changes look reasonable. You can r= nikic and me after PR CI is green.

@bjorn3
Copy link
Member Author

bjorn3 commented Oct 17, 2024

@bors r=nikic,jieyouxu

@bors
Copy link
Contributor

bors commented Oct 17, 2024

📌 Commit cab206c has been approved by nikic,jieyouxu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 17, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 17, 2024
…kic,jieyouxu

Return values larger than 2 registers using a return area pointer

LLVM and Cranelift disagree about how to return values that don't fit in the registers designated for return values. LLVM will force the entire return value to be passed by return area pointer, while Cranelift will look at each IR level return value independently and decide to pass it in a register or not, which would result in the return value being passed partially in registers and partially through a return area pointer.

While Cranelift may need to be fixed as the LLVM behavior is generally more correct with respect to the surface language, forcing this behavior in rustc itself makes it easier for other backends to conform to the Rust ABI and for the C ABI rustc already handles this behavior anyway.

In addition LLVM's decision to pass the return value in registers or using a return area pointer depends on how exactly the return type is lowered to an LLVM IR type. For example `Option<u128>` can be lowered as `{ i128, i128 }` in which case the x86_64 backend would use a return area pointer, or it could be passed as `{ i32, i128 }` in which case the x86_64 backend would pass it in registers by taking advantage of an LLVM ABI extension that allows using 3 registers for the x86_64 sysv call conv rather than the officially specified 2 registers.

This adjustment is only necessary for the Rust ABI as for other ABI's the calling convention implementations in rustc_target already ensure any return value which doesn't fit in the available amount of return registers is passed in the right way for the current target.

Helps with rust-lang/rustc_codegen_cranelift#1525
cc bytecodealliance/wasmtime#9250
@bors
Copy link
Contributor

bors commented Oct 17, 2024

⌛ Testing commit cab206c with merge f075de0...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Oct 17, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 17, 2024
@rust-cloud-vms rust-cloud-vms bot force-pushed the rust_abi_follow_c_rules branch from cab206c to cc7044b Compare October 19, 2024 13:09
@bjorn3
Copy link
Member Author

bjorn3 commented Oct 19, 2024

An assembly test needed to be updated as LLVM decided to reorder an argument load instruction.

@bors r=nikic,jieyouxu

@bors
Copy link
Contributor

bors commented Oct 19, 2024

📌 Commit cc7044b has been approved by nikic,jieyouxu

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 19, 2024
@bors
Copy link
Contributor

bors commented Oct 19, 2024

⌛ Testing commit cc7044b with merge a2a1206...

@bors
Copy link
Contributor

bors commented Oct 19, 2024

☀️ Test successful - checks-actions
Approved by: nikic,jieyouxu
Pushing a2a1206 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 19, 2024
@bors bors merged commit a2a1206 into rust-lang:master Oct 19, 2024
7 checks passed
@rustbot rustbot added this to the 1.84.0 milestone Oct 19, 2024
@bjorn3 bjorn3 deleted the rust_abi_follow_c_rules branch October 19, 2024 16:45
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a2a1206): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary 3.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.4% [3.4%, 3.4%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 780.796s -> 782.005s (0.15%)
Artifact size: 333.73 MiB -> 333.81 MiB (0.02%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ABI Area: Concerning the application binary interface (ABI) A-cranelift Things relevant to the [future] cranelift backend merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.