Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocas in codegen for simple mir::Aggregate statements #123886

Merged
merged 3 commits into from
May 10, 2024

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Apr 13, 2024

The core idea here is to remove the abstraction penalty of simple newtypes in codegen.

Even something simple like constructing a

#[repr(transparent)] struct Foo(u32);

forces an alloca to be generated in nightly right now.

Certainly LLVM can optimize that away, but it would be nice if it didn't have to.

Quick example:

#[repr(transparent)]
pub struct Transparent32(u32);

#[no_mangle]
pub fn make_transparent(x: u32) -> Transparent32 {
    let a = Transparent32(x);
    a
}

on nightly we produce https://rust.godbolt.org/z/zcvoM79ae

define noundef i32 @make_transparent(i32 noundef %x) unnamed_addr #0 {
  %a = alloca i32, align 4
  store i32 %x, ptr %a, align 4
  %0 = load i32, ptr %a, align 4, !noundef !3
  ret i32 %0
}

but after this PR we produce

define noundef i32 @make_transparent(i32 noundef %x) unnamed_addr #0 {
start:
  ret i32 %x
}

(even before the optimizer runs).

@rustbot
Copy link
Collaborator

rustbot commented Apr 13, 2024

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 13, 2024
@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 13, 2024
Avoid `alloca`s in codegen for simple pairs and simple transparent structs

Even something simple like constructing a
```rust
#[repr(transparent)] struct Foo(u32);
```
forces an `alloca` to be generated in nightly right now.

Certainly LLVM can optimize that away, but it would be nice if it didn't have to.

Quick example:
```rust
#[repr(transparent)]
pub struct Transparent32(u32);

#[no_mangle]
pub fn make_transparent(x: u32) -> Transparent32 {
    let a = Transparent32(x);
    a
}
```
on nightly we produce <https://rust.godbolt.org/z/zcvoM79ae>
```llvm
define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 {
  %a = alloca i32, align 4
  store i32 %x, ptr %a, align 4
  %0 = load i32, ptr %a, align 4, !noundef !3
  ret i32 %0
}
```
but after this PR we produce
```llvm
define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 {
start:
  ret i32 %x
}
```
(even before the optimizer runs).
@bors
Copy link
Contributor

bors commented Apr 13, 2024

⌛ Trying commit dd8ffd0 with merge 8dfd58a...

@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Apr 13, 2024

⌛ Trying commit 7cb89e3 with merge 1d0d3d3...

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 13, 2024
Avoid `alloca`s in codegen for simple pairs and simple transparent structs

Even something simple like constructing a
```rust
#[repr(transparent)] struct Foo(u32);
```
forces an `alloca` to be generated in nightly right now.

Certainly LLVM can optimize that away, but it would be nice if it didn't have to.

Quick example:
```rust
#[repr(transparent)]
pub struct Transparent32(u32);

#[no_mangle]
pub fn make_transparent(x: u32) -> Transparent32 {
    let a = Transparent32(x);
    a
}
```
on nightly we produce <https://rust.godbolt.org/z/zcvoM79ae>
```llvm
define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 {
  %a = alloca i32, align 4
  store i32 %x, ptr %a, align 4
  %0 = load i32, ptr %a, align 4, !noundef !3
  ret i32 %0
}
```
but after this PR we produce
```llvm
define noundef i32 `@make_transparent(i32` noundef %x) unnamed_addr #0 {
start:
  ret i32 %x
}
```
(even before the optimizer runs).
@fee1-dead
Copy link
Member

r? compiler

@rustbot rustbot assigned TaKO8Ki and unassigned fee1-dead Apr 13, 2024
@bors
Copy link
Contributor

bors commented Apr 13, 2024

☀️ Try build successful - checks-actions
Build commit: 1d0d3d3 (1d0d3d3e219daac686e2fd9110e09772b6d18dfc)

1 similar comment
@bors

This comment was marked as duplicate.

@rust-timer

This comment has been minimized.

// According to `rvalue_creates_operand`, only ZST
// aggregate rvalues are allowed to be operands.
// repat rvalues are allowed to be operands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeat

Comment on lines 731 to 729
let ty = self.monomorphize(ty);
let layout = self.cx.layout_of(self.monomorphize(ty));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is second call to monomorphize required?

@rust-timer

This comment was marked as outdated.

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 13, 2024
@lqd
Copy link
Member

lqd commented Apr 13, 2024

(I've seen regex and wg-grammar being noisy in other PRs)

@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@scottmcm
Copy link
Member Author

scottmcm commented May 9, 2024

@matthewjasper just wanted to make sure you have a notification for this review since the r? might not have sent you one.

@matthewjasper
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented May 10, 2024

📌 Commit c38f75c has been approved by matthewjasper

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 10, 2024
@bors
Copy link
Contributor

bors commented May 10, 2024

⌛ Testing commit c38f75c with merge 6e1d947...

@bors
Copy link
Contributor

bors commented May 10, 2024

☀️ Test successful - checks-actions
Approved by: matthewjasper
Pushing 6e1d947 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 10, 2024
@bors bors merged commit 6e1d947 into rust-lang:master May 10, 2024
7 checks passed
@rustbot rustbot added this to the 1.80.0 milestone May 10, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6e1d947): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.3%] 6
Regressions ❌
(secondary)
0.5% [0.2%, 1.3%] 4
Improvements ✅
(primary)
-0.6% [-1.0%, -0.3%] 6
Improvements ✅
(secondary)
-0.7% [-1.1%, -0.4%] 2
All ❌✅ (primary) -0.2% [-1.0%, 0.3%] 12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.4% [3.4%, 3.4%] 1
Regressions ❌
(secondary)
4.8% [4.8%, 4.8%] 1
Improvements ✅
(primary)
-1.8% [-3.0%, -0.6%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.8% [-3.0%, 3.4%] 5

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-1.6%, -0.0%] 73
Improvements ✅
(secondary)
-0.5% [-1.6%, -0.0%] 11
All ❌✅ (primary) -0.7% [-1.6%, -0.0%] 73

Bootstrap: 673.906s -> 673.586s (-0.05%)
Artifact size: 315.77 MiB -> 315.84 MiB (0.02%)

@scottmcm scottmcm deleted the more-rvalue-operands branch May 11, 2024 01:12
@nnethercote
Copy link
Contributor

Nice binary size reductions. They are almost entirely in debug builds, indicating that "Certainly LLVM can optimize that away" is correct but that only happens in opt builds :)

@scottmcm
Copy link
Member Author

Yeah; I was hoping that LLVM not needing to optimize them away would be more of a perf win in opt, but I'll take instruction-neutral improvements to debug build perf too :)

std::cell::Cell::new(b)
}

// CHECK-LABLE: { i8, i16 } @make_cell_of_bool_and_short(i1 noundef zeroext %b, i16 noundef %s)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here: "CHECK LABLE" should be "LABEL". The test is presumably not working properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye; thank you. I've added a fix for that to #124999

bors added a commit to rust-lang-ci/rust that referenced this pull request May 11, 2024
fix few typos in filecheck annotations

Inspired by rust-lang#123886 (comment)

`rg -g '*.rs' '//\s+?[\w-]+(-[\w]+):' -r '$1' -oNI | sort -u`

Should https://llvm.org/docs/CommandGuide/FileCheck.html#cmdoption-FileCheck-ignore-case be used for case-insensetive match for filecheck?
RalfJung pushed a commit to RalfJung/miri that referenced this pull request May 12, 2024
fix few typos in filecheck annotations

Inspired by rust-lang/rust#123886 (comment)

`rg -g '*.rs' '//\s+?[\w-]+(-[\w]+):' -r '$1' -oNI | sort -u`

Should https://llvm.org/docs/CommandGuide/FileCheck.html#cmdoption-FileCheck-ignore-case be used for case-insensetive match for filecheck?
bors added a commit to rust-lang-ci/rust that referenced this pull request May 13, 2024
Unify `Rvalue::Aggregate` paths in cg_ssa

In rust-lang#123840 and rust-lang#123886 I added two different codepaths for `Rvalue::Aggregate` in `cg_ssa`.

This merges them into one, since raw pointers are also immediates that can be built from the immediates of their "fields".
@Kobzol
Copy link
Contributor

Kobzol commented May 14, 2024

Nice binary size reductions. The regressions are limited to a single benchmark (bitmaps), while the wins are across three different (primary) benchmarks.

@rustbot label: +perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.