Stop using LLVM struct types for alloca #122053

erikdesjardins · 2024-03-06T02:05:12Z

The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using [N x i8] is a more direct way to specify that we want N bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation.

Split out from #121577.

r? @ghost

erikdesjardins · 2024-03-06T02:05:28Z

Test changes will conflict with #122050.

Additionally, I believe SROA still uses the alloca type in some cases, so this may cause perf regressions, and hence not be viable, until LLVM 19 (or later).

nikic · 2024-03-06T08:59:04Z

@bors try @rust-timer queue

bors · 2024-03-06T09:00:14Z

⌛ Trying commit 26be569 with merge 52e34ca...

Stop using LLVM struct types for alloca The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation. Split out from rust-lang#121577. r? `@ghost`

bors · 2024-03-06T10:27:41Z

☀️ Try build successful - checks-actions
Build commit: 52e34ca (52e34ca98c60292a68ff4f7b01a3bb5d813cda61)

bors · 2024-03-06T10:27:41Z

☀️ Try build successful - checks-actions
Build commit: 52e34ca (52e34ca98c60292a68ff4f7b01a3bb5d813cda61)

rust-timer · 2024-03-06T12:46:45Z

Finished benchmarking commit (52e34ca): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	2
Regressions ❌ (secondary)	0.3%	[0.2%, 0.4%]	6
Improvements ✅ (primary)	-1.6%	[-1.6%, -1.6%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-1.6%, 0.2%]	3

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.4%	[-2.0%, -1.0%]	3
Improvements ✅ (secondary)	-2.3%	[-2.9%, -1.7%]	2
All ❌✅ (primary)	-1.4%	[-2.0%, -1.0%]	3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 647.585s -> 646.111s (-0.23%)
Artifact size: 175.07 MiB -> 175.01 MiB (-0.03%)

compiler-errors · 2024-03-15T03:22:00Z

@bors try @rust-timer queue

bors · 2024-03-15T03:23:09Z

⌛ Trying commit 8536da4 with merge 7a9b98b...

Stop using LLVM struct types for alloca The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation. Split out from rust-lang#121577. r? `@ghost`

bors · 2024-03-15T04:51:39Z

☀️ Try build successful - checks-actions
Build commit: 7a9b98b (7a9b98bb9b00c608f4174583891c151b87723e9f)

rust-timer · 2024-03-15T06:06:17Z

Finished benchmarking commit (7a9b98b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.2%, 0.6%]	9
Regressions ❌ (secondary)	0.7%	[0.2%, 3.7%]	9
Improvements ✅ (primary)	-1.7%	[-1.7%, -1.7%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-1.7%, 0.6%]	10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.5%	[-2.5%, -2.5%]	1
Improvements ✅ (secondary)	-2.1%	[-2.5%, -1.7%]	2
All ❌✅ (primary)	-2.5%	[-2.5%, -2.5%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.8%	[3.8%, 3.9%]	2
Improvements ✅ (primary)	-1.1%	[-1.1%, -1.1%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.1%	[-1.1%, -1.1%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 671.201s -> 669.516s (-0.25%)
Artifact size: 311.47 MiB -> 311.44 MiB (-0.01%)

erikdesjardins · 2024-03-15T19:15:04Z

Both perf runs look somewhat consistent. In terms of instruction count:

Improvements to exa. These look legit, cachegrind diffs show LLVM does less work. The resulting exa opt binaries are the same size (within 8 bytes), so this seems to be a real improvement and not because some optimization was prevented.

Regressions on a few other benchmarks--stm32f4, cranelift-codegen, tuple-stress, and externs show up in both perf runs. These also look real--the cachegrind diffs for those four all look something like:

--------------------------------------------------------------------------------
-- Function:file summary
--------------------------------------------------------------------------------
  Ir_________  function:file

>  25,868,377  <rustc_lexer::cursor::Cursor>::advance_token:???
> -16,715,810  <hashbrown::raw::RawTable<(rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>:???
>  16,715,810  <hashbrown::raw::RawTable<(rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId)>>::reserve_rehash::<hashbrown::map::make_hasher<rustc_ast::node_id::NodeId, rustc_hir::hir_id::ItemLocalId, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>::{closure#0}>:???
>  -4,391,303  <rustc_metadata::creader::CrateMetadataRef>::opt_item_name:???
>   4,064,505  rustc_metadata::rmeta::decoder::cstore_impl::provide_extern::associated_item:???
>   1,188,404  <rustc_span::span_encoding::Span>::to:???
>  -1,179,620  <rustc_parse::parser::Parser>::parse_expr_assoc_with:???
...

that is, a bunch of inlining noise, and a real regression in rustc_lexer::cursor::Cursor::advance_token. It seems like this change causes advance_token to consistently execute more instructions.

Looking at cycle count, while exa still has a significant improvement, the cycle count changes for stm32f4, cranelift-codegen, tuple-stress, and externs are all well below the significance threshold (some tiny improvements, some tiny regressions, looks like noise). So perhaps advance_token isn't actually slower, it just gets codegenned in a way that executes more instructions.

Looking at the asm diff for advance_token, a lot of code is moved around, but there don't seem to be any significant changes, except for this diff at an early exit near the beginning of the function (although I could have missed something--full diff here, after normalizing constants etc.):

 	mov    rdx,QWORD PTR [rsi]
 	mov    rdi,QWORD PTR [rsi+0xXXXX]
 	mov    ecx,DWORD PTR [rsi+0xXXXX]
 	add    ecx,edx
 	sub    ecx,edi
+	shl    r15,0x20
+	movzx  r8d,r13b
+	shl    r8d,0x18
+	or     r8,r15
+	movzx  r9d,r12b
+	shl    r9d,0x10
+	or     r9,r8
+	shl    r14d,0x8
+	movzx  r8d,r14w
+	or     r8,r9
+	movzx  r9d,bpl
+	or     r9,r8
 	sub    rdi,rdx
 	mov    QWORD PTR [rsi+0xXXXX],rdi
-	mov    BYTE PTR [rax],r11b
-	mov    BYTE PTR [rax+0xXXXX],bpl
-	mov    BYTE PTR [rax+0xXXXX],r14b
-	mov    BYTE PTR [rax+0xXXXX],r15b
-	mov    DWORD PTR [rax+0xXXXX],r12d
+	mov    QWORD PTR [rax],r9
 	mov    DWORD PTR [rax+0xXXXX],ecx
 	add    rsp,0x28
 	pop    rbx

This uses 7 more instructions, but it combines 5 stores into 1, which seems better.

So, given that, and the lack of regressions in cycle count on those four benchmarks, I think this is not actually a regression, and we can go ahead with this change.

rustbot · 2024-03-15T19:15:21Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

nikic · 2024-04-15T00:09:03Z

Looks like there are two extra variants of the stack protector test for Windows, which will need the same updates as the Linux test.

RalfJung · 2024-04-17T14:16:10Z

@bors r-
(PR got re-queued)

erikdesjardins · 2024-04-23T12:26:09Z

@rustbot review

nikic · 2024-04-24T00:53:17Z

@bors r+

bors · 2024-04-24T00:53:20Z

📌 Commit 6df27ef has been approved by nikic

It is now in the queue for this repository.

bors · 2024-04-24T03:00:48Z

⌛ Testing commit 6df27ef with merge 29a56a3...

bors · 2024-04-24T05:02:09Z

☀️ Test successful - checks-actions
Approved by: nikic
Pushing 29a56a3 to master...

rust-timer · 2024-04-24T06:19:38Z

Finished benchmarking commit (29a56a3): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.3%]	8
Regressions ❌ (secondary)	0.4%	[0.2%, 1.1%]	17
Improvements ✅ (primary)	-1.9%	[-1.9%, -1.9%]	1
Improvements ✅ (secondary)	-0.4%	[-0.4%, -0.4%]	1
All ❌✅ (primary)	-0.0%	[-1.9%, 0.3%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.8%	[-2.8%, -2.8%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.8%	[-2.8%, -2.8%]	1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.7%	[1.5%, 2.0%]	3
Improvements ✅ (primary)	-1.2%	[-1.2%, -1.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.2%	[-1.2%, -1.2%]	1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.2%	[-0.2%, -0.2%]	1

Bootstrap: 675.097s -> 673.216s (-0.28%)
Artifact size: 316.24 MiB -> 315.45 MiB (-0.25%)

erikdesjardins · 2024-04-24T12:37:48Z

Spot-checking a few of those instruction count regressions, e.g. unicode-normalization:

--------------------------------------------------------------------------------
-- Function:file summary
--------------------------------------------------------------------------------
  Ir_______  function:file

> 2,007,011  <rustc_lexer::cursor::Cursor>::advance_token:???
>   927,357  <rustc_parse::parser::Parser>::collect_tokens_for_expr::<<rustc_parse::parser::Parser>::parse_expr_prefix::{closure#7}>::{closure#0}:???
>  -859,667  <rustc_parse::parser::Parser>::parse_expr_prefix::{closure#7}:???
>   321,256  <rustc_span::span_encoding::Span>::to:???
>  -263,369  <rustc_parse::parser::Parser>::parse_expr_assoc_with:???
>  -127,459  <rustc_parse::lexer::tokentrees::TokenTreesReader>::bump:???
>   -66,640  <rustc_parse::parser::Parser>::parse_expr_prefix:???
>    43,746  <rustc_parse::parser::Parser>::parse_expr_dot_or_call:???
...

this looks like the same thing I investigated above (#122053 (comment)), where we merge 5 stores into 1 in advance_token, which is better, but involves using more instructions.

So given the relative lack of cycle count changes (a small improvement on one benchmark, and a small regression on another benchmark), I think this change doesn't have any significant perf impact.

Mark-Simulacrum · 2024-04-29T13:10:28Z

Edit: Deleting comment, meant to leave it on another PR.

Stop using LLVM struct types for alloca The alloca type has no semantic meaning, only the size (and alignment, but we specify it explicitly) matter. Using `[N x i8]` is a more direct way to specify that we want `N` bytes, and avoids relying on LLVM's struct layout. It is likely that a future LLVM version will change to an untyped alloca representation. Split out from rust-lang#121577. r? `@ghost`

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 6, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 6, 2024

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 6, 2024

erikdesjardins force-pushed the alloca branch 3 times, most recently from 8515835 to 8536da4 Compare March 15, 2024 02:36

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 15, 2024

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 15, 2024

klensy mentioned this pull request Mar 15, 2024

Fill in HIR hash for associated opaque types #122517

Merged

erikdesjardins marked this pull request as ready for review March 15, 2024 19:15

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 12, 2024

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 17, 2024

scottmcm mentioned this pull request Apr 19, 2024

Refactoring after the PlaceValue addition #124153

Merged

also update windows slack-protector tests

6df27ef

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 23, 2024

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 24, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 24, 2024

bors merged commit 29a56a3 into rust-lang:master Apr 24, 2024

rustbot added this to the 1.79.0 milestone Apr 24, 2024

erikdesjardins deleted the alloca branch April 24, 2024 12:10

fangism mentioned this pull request Apr 25, 2024

Nondeterminism encountered in metadata for doc_link_resolutions #124357

Closed

Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Apr 29, 2024

eddyb mentioned this pull request Dec 3, 2024

Hack around blockers to update toolchain all the way to nightly-2024-11-22 (~1.84). Rust-GPU/rust-gpu#170

Merged

eddyb mentioned this pull request Dec 17, 2024

Tracking issue: undo rustc_codegen_ssa patching (aka pqp_cg_ssa). Rust-GPU/rust-gpu#182

Open

5 tasks

LegNeato mentioned this pull request Jan 27, 2025

Port to more current rust-nightly Rust-GPU/Rust-CUDA#98

Closed

Stop using LLVM struct types for alloca #122053

Stop using LLVM struct types for alloca #122053

Uh oh!

Conversation

erikdesjardins commented Mar 6, 2024

Uh oh!

erikdesjardins commented Mar 6, 2024

Uh oh!

nikic commented Mar 6, 2024

Uh oh!

This comment has been minimized.

bors commented Mar 6, 2024

Uh oh!

bors commented Mar 6, 2024

Uh oh!

bors commented Mar 6, 2024

Uh oh!

This comment has been minimized.

rust-timer commented Mar 6, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

compiler-errors commented Mar 15, 2024

Uh oh!

This comment has been minimized.

bors commented Mar 15, 2024

Uh oh!

bors commented Mar 15, 2024

Uh oh!

This comment has been minimized.

rust-timer commented Mar 15, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

erikdesjardins commented Mar 15, 2024

Uh oh!

rustbot commented Mar 15, 2024

Uh oh!

nikic commented Apr 15, 2024

Uh oh!

RalfJung commented Apr 17, 2024

Uh oh!

erikdesjardins commented Apr 23, 2024

Uh oh!

nikic commented Apr 24, 2024

Uh oh!

bors commented Apr 24, 2024

Uh oh!

bors commented Apr 24, 2024

Uh oh!

bors commented Apr 24, 2024

Uh oh!

rust-timer commented Apr 24, 2024

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

erikdesjardins commented Apr 24, 2024

Uh oh!

Mark-Simulacrum commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Mark-Simulacrum commented Apr 29, 2024 •

edited

Loading