rustc should use llvm.lifetime intrinsics to reduce stack space #15665

zwarich · 2014-07-14T17:26:05Z

It recently came up on dev-servo that Rust-compiled code is using an egregious amount of stack space. One possible solution that has been brought up is that we should use llvm.lifetime.start and llvm.lifetime.end intrinsics to reduce the amount of stack space that LLVM uses for our functions: http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic.

An incomplete attempt at this has been made in an earlier PR: #12004, and the numbers there were promising.

The text was updated successfully, but these errors were encountered:

emberian · 2014-07-14T23:30:04Z

cc @dotdash

dotdash · 2014-07-20T16:49:52Z

Started working on this again.

…eneral Lifetime intrinsics help to reduce stack usage, because LLVM can apply stack coloring to reuse the stack slots of dead allocas for new ones. For example these functions now both use the same amount of stack, while previous `bar()` used five times as much as `foo()`: ````rust fn foo() { println("{}", 5); } fn bar() { println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); } ```` On top of that, LLVM can also optimize out certain operations when it knows that memory is dead after a certain point. For example, it can sometimes remove the zeroing used to cancel the drop glue. This is possible when the glue drop itself was already removed because the zeroing dominated the drop glue call. For example in: ````rust pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) { x } ```` With optimizations, this currently results in: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false) ret void } ```` But with lifetime intrinsics we get: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.lifetime.end(i64 16, i8* %2) ret void } ```` Fixes rust-lang#15665

Lifetime intrinsics help to reduce stack usage, because LLVM can apply stack coloring to reuse the stack slots of dead allocas for new ones. For example these functions now both use the same amount of stack, while previous `bar()` used five times as much as `foo()`: ````rust fn foo() { println("{}", 5); } fn bar() { println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); println("{}", 5); } ```` On top of that, LLVM can also optimize out certain operations when it knows that memory is dead after a certain point. For example, it can sometimes remove the zeroing used to cancel the drop glue. This is possible when the glue drop itself was already removed because the zeroing dominated the drop glue call. For example in: ````rust pub fn bar(x: (Box<int>, int)) -> (Box<int>, int) { x } ```` With optimizations, this currently results in: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 16, i32 8, i1 false) ret void } ```` But with lifetime intrinsics we get: ````llvm define void @_ZN3bar20h330fa42547df8179niaE({ i64*, i64 }* noalias nocapture nonnull sret, { i64*, i64 }* noalias nocapture nonnull) unnamed_addr #0 { "_ZN29_$LP$Box$LT$int$GT$$C$int$RP$39glue_drop.$x22glue_drop$x22$LP$1347$RP$17h88cf42702e5a322aE.exit": %2 = bitcast { i64*, i64 }* %1 to i8* %3 = bitcast { i64*, i64 }* %0 to i8* tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %3, i8* %2, i64 16, i32 8, i1 false) tail call void @llvm.lifetime.end(i64 16, i8* %2) ret void } ```` Fixes #15665

The allocas used in match expression currently don't get good lifetime markers, in fact they only get lifetime start markers, because their lifetimes don't match to cleanup scopes. While the bindings themselves are bog standard and just need a matching pair of start and end markers, they might need them twice, once for a guard clause and once for the match body. The __llmatch alloca OTOH needs a single lifetime start marker, but when there's a guard clause, it needs two end markers, because its lifetime ends either when the guard doesn't match or after the match body. With these intrinsics in place, LLVM can now, for example, optimize code like this: ````rust enum E { A1(int), A2(int), A3(int), A4(int), } pub fn variants(x: E) { match x { A1(m) => bar(&m), A2(m) => bar(&m), A3(m) => bar(&m), A4(m) => bar(&m), } } ```` To a single call to bar, using only a single stack slot. It still fails to eliminate some of checks. ````gas .Ltmp5: .cfi_def_cfa_offset 16 movb (%rdi), %al testb %al, %al je .LBB3_5 movzbl %al, %eax cmpl $1, %eax je .LBB3_5 cmpl $2, %eax .LBB3_5: movq 8(%rdi), %rax movq %rax, (%rsp) leaq (%rsp), %rdi callq _ZN3bar20hcb7a0d8be8e17e37daaE@PLT popq %rax retq ```` Refs #15665

…cola internal: De-`unwrap` `generate_function.rs` Fixes rust-lang/rust-analyzer#15398 (comment) cc `@Inicola`

emberian added I-slow labels Jul 14, 2014

larsbergstrom mentioned this issue Jul 16, 2014

Tracking issue for Rust high-pri or blocking issues for Servo & Gecko servo/servo#2853

Closed

1 task

dotdash mentioned this issue Jul 21, 2014

Emit LLVM lifetime intrinsics to improve stack usage and codegen in general #15863

Merged

bors closed this as completed in #15863 Jul 22, 2014

dotdash mentioned this issue Jul 23, 2014

Improve usage of lifetime intrinsics in match expressions #15921

Merged

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 13, 2023

Auto merge of rust-lang#15665 - Milo123459:milo/remove-unwraps, r=lni…

d3cc3bc

…cola internal: De-`unwrap` `generate_function.rs` Fixes rust-lang/rust-analyzer#15398 (comment) cc `@Inicola`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rustc should use llvm.lifetime intrinsics to reduce stack space #15665

rustc should use llvm.lifetime intrinsics to reduce stack space #15665

zwarich commented Jul 14, 2014

emberian commented Jul 14, 2014

dotdash commented Jul 20, 2014

rustc should use llvm.lifetime intrinsics to reduce stack space #15665

rustc should use llvm.lifetime intrinsics to reduce stack space #15665

Comments

zwarich commented Jul 14, 2014

emberian commented Jul 14, 2014

dotdash commented Jul 20, 2014