introduce a "try" ZIR and AIR instruction #11772

andrewrk · 2022-06-01T20:06:04Z

Motivation

While investigating #11498 I noticed that one of the problems - perhaps the main problem - is that our stage1 LLVM lowering manages to get optimized into something like this:

    if (!init_a()) goto cleanup_a;
    if (!init_b()) goto cleanup_b;
    if (!init_c()) goto cleanup_c;
cleanup_all:
    deinit_c();
cleanup_c:
    deinit_b();
cleanup_b:
    deinit_a();
cleanup_a:
    return;

Whereas our stage2 LLVM lowering remains in the form that it goes in, which looks more like this:

    if (!init_a()) {
        return;
    }
    if (!init_b()) {
        deinit_a();
        return;
    }
    if (!init_c()) {
        deinit_b();
        deinit_a();
        return;
    }
    deinit_c();
    deinit_b();
    deinit_a();
    return;

A direct fix to this would be to implement #283 - something that I plan to investigate as well. However, this led me to notice a related difference in the input LLVM IR that gets generated in stage1 vs stage2. Here is an example:

fn doTheTest() !void {
    var list = std.ArrayList([]const u8).init(std.heap.page_allocator);
    defer list.deinit();

    var list2 = std.ArrayList([]const u8).init(std.heap.page_allocator);
    defer list2.deinit();

    try foo();
    try foo();
}

Specifically for the try foo(); line, stage1 lowers to this LLVM IR:

  %2 = call fastcc i16 @foo()
  store i16 %2, i16* %0, align 2
  %3 = icmp ne i16 %2, 0
  br i1 %3, label %ErrRetReturn, label %ErrRetContinue

ErrRetReturn:                                     ; preds = %Entry
  %4 = load i16, i16* %0, align 2
  store i16 %4, i16* %result, align 2
  call fastcc void @"std.array_list.ArrayListAligned([]const u8,null).deinit"(%"std.array_list.ArrayListAligned([]const u8,null)"* %list2)
  call fastcc void @"std.array_list.ArrayListAligned([]const u8,null).deinit"(%"std.array_list.ArrayListAligned([]const u8,null)"* %list)
  ret i16 %4

ErrRetContinue:                                   ; preds = %Entry

while stage2 lowers to this:

  %8 = call fastcc i16 @test2.foo()
  %9 = icmp eq i16 %8, 0
  br i1 %9, label %Then, label %Else

Then:                                             ; preds = %Entry
  br label %Block

Else:                                             ; preds = %Entry
  call fastcc void @"array_list.ArrayListAligned([]const u8,null).deinit"(%"array_list.ArrayListAligned([]const u8,null)"* %1)
  call fastcc void @"array_list.ArrayListAligned([]const u8,null).deinit"(%"array_list.ArrayListAligned([]const u8,null)"* %3)
  ret i16 %8

Block:                                            ; preds = %Then

Perhaps this difference is related to LLVM's relative ability to optimize. At least, the former LLVM IR is simpler since it has fewer basic blocks, creating less potential issues for LLVM both in terms of optimization and compilation speed.

So I wanted to start exploring how to emit the better code like stage1 does. The first thing I did was look at the corresponding AIR:

  %68!= block(void, {
    %74 = call(%72, [])
    %75 = is_non_err(%74)
    %103!= cond_br(%75!, {
      %83! %92!
      %76 = unwrap_errunion_payload(void, %74!)
      %77!= br(%68, %76!)
    }, {
      %79 = unwrap_errunion_err((error_set_inferred func=Module.Decl.Index(153)), %74!)
      %85 = load((struct decl=Module.Decl.Index(256)), %52)
      %88!= call(%83!, [%85!])
      %94 = load((struct decl=Module.Decl.Index(256)), %1)
      %97!= call(%92!, [%94!])
      %99 = bitcast((error_set_inferred func=Module.Decl.Index(152)), %79!)
      %101 = wrap_errunion_err((error_set_inferred func=Module.Decl.Index(152))!void, %99!)
      %102!= ret(%101!)
    })
  })

This is the AIR for the try foo(); line. After pondering a potential optimization during the LLVM backend in order to lower this as desired, I made the following observations:

This kind of optimization does not work very well with the existing structure. It would require a full recursive scan of the tree at each block, doing redundant work as the regular lowering, or it would require noticing too late that the optimization could have happened and then rewriting the LLVM IR using LLVM's API.
This optimization would need to be repeated in every backend which is inefficient from a development effort perspective and leaves room for a lot of bugs and inconsistency in Zig's codegen quality.
Doing an optimization earlier in the pipeline would potentially improve performance due to creating fewer IR instructions in memory along multiple phases of the pipeline.
try is extremely common in Zig source code.

Proposed Changes

In short summary, lower try more efficiently, taking advantage of a new special purpose ZIR instruction and AIR instruction.

Taking the same Zig source code example from above, here is the proposed difference in ZIR:

ZIR Before

      %154 = block({
        %149 = decl_val("foo") token_offset:34:9
        %150 = dbg_stmt(8, 12)
        %151 = call(.auto, %149, []) node_offset:34:12
        %152 = is_non_err(%151) node_offset:34:5
        %153 = condbr(%152, {
          %155 = err_union_payload_unsafe(%151) node_offset:34:5
          %166 = break(%154, %155)
        }, {
          %156 = err_union_code(%151) node_offset:34:5
          %157 = dbg_stmt(6, 11)
          %158 = field_call_bind(%130, "deinit") node_offset:32:16
          %159 = dbg_stmt(6, 23)
          %160 = call(nodiscard .auto, %158, []) node_offset:32:23
          %161 = dbg_stmt(3, 11)
          %162 = field_call_bind(%111, "deinit") node_offset:29:15
          %163 = dbg_stmt(3, 22)
          %164 = call(nodiscard .auto, %162, []) node_offset:29:22
          %165 = ret_node(%156) node_offset:34:5
        }) node_offset:34:5
      }) node_offset:34:5
      %167 = ensure_result_used(%154) node_offset:34:5

ZIR After

      %149 = decl_val("foo") token_offset:34:9
      %150 = dbg_stmt(8, 12)
      %151 = call(.auto, %149, []) node_offset:34:12
      %153 = try(%151, {
        %156 = err_union_code(%151) node_offset:34:5
        %157 = dbg_stmt(6, 11)
        %158 = field_call_bind(%130, "deinit") node_offset:32:16
        %159 = dbg_stmt(6, 23)
        %160 = call(nodiscard .auto, %158, []) node_offset:32:23
        %161 = dbg_stmt(3, 11)
        %162 = field_call_bind(%111, "deinit") node_offset:29:15
        %163 = dbg_stmt(3, 22)
        %164 = call(nodiscard .auto, %162, []) node_offset:29:22
        %165 = ret_node(%156) node_offset:34:5
      }) node_offset:34:5
      %167 = ensure_result_used(%153) node_offset:34:5

ZIR Explanation

This new try instruction would implicitly do the checking whether an operand is an error, and its result value would be the unwrapped payload. The body that it provides is what executes in the "is an error" case, which you can see here is running the defer expressions. In the future if #283 is implemented, this would change to simply take another operand which is the block to break from in case of an error. In both cases the try body would have the guarantee that there is no control flow possible from inside the try body to directly after the try instruction.

Sema

Sema would have the option to lower the ZIR try instruction the same as before using already existing AIR instructions, however, introducing a try AIR instruction as well would result in additional savings and better codegen without optimizations.

AIR Before

This is the same snippet as above in the Motivation section, however I reproduced it below for comparison:

  %68!= block(void, {
    %74 = call(%72, [])
    %75 = is_non_err(%74)
    %103!= cond_br(%75!, {
      %83! %92!
      %76 = unwrap_errunion_payload(void, %74!)
      %77!= br(%68, %76!)
    }, {
      %79 = unwrap_errunion_err((error_set_inferred func=Module.Decl.Index(153)), %74!)
      %85 = load((struct decl=Module.Decl.Index(256)), %52)
      %88!= call(%83!, [%85!])
      %94 = load((struct decl=Module.Decl.Index(256)), %1)
      %97!= call(%92!, [%94!])
      %99 = bitcast((error_set_inferred func=Module.Decl.Index(152)), %79!)
      %101 = wrap_errunion_err((error_set_inferred func=Module.Decl.Index(152))!void, %99!)
      %102!= ret(%101!)
    })
  })

AIR After

  %74 = call(%72, [])
  %103!= try(%74!, {
    %79 = unwrap_errunion_err((error_set_inferred func=Module.Decl.Index(153)), %74!)
    %85 = load((struct decl=Module.Decl.Index(256)), %52)
    %88!= call(%83!, [%85!])
    %94 = load((struct decl=Module.Decl.Index(256)), %1)
    %97!= call(%92!, [%94!])
    %99 = bitcast((error_set_inferred func=Module.Decl.Index(152)), %79!)
    %101 = wrap_errunion_err((error_set_inferred func=Module.Decl.Index(152))!void, %99!)
    %102!= ret(%101!)
  })

Conclusion

This AIR is trivial to lower to only 1 branch / 1 basic block rather than the two required by the block / cond_br combo.

This would require adding support for this new kind of control flow in all the codegen backends, however, I think it will be worth it because it offers both improved compilation speed and codegen, which is the kind of tradeoff we are intentionally making for this compiler.

The text was updated successfully, but these errors were encountered:

kubkon · 2022-06-01T21:59:13Z

This change looks good to me, and I'll be keen to go with it in terms of implementation in the self-hosted x86_64 backend to see how it fares.

Techcable · 2022-06-01T22:24:15Z

In general, LLVM is bad (and slow) at optimizing a lot of "guards" that exit the function.

This is a very severe issue In speculatively optimizing JIT compilers, which have a ton of branches that exit the function.

Zig has nowhere near as many "guards" as JavascriptCore. However, this appears to be a less severe case of the same issue. There are lots of branches that immediately exit the function if false.

Here is a fascinating post on why JavascriptCore switched away from LLVM to its own B3 backand

Each check breaks its basic block in two and then introduces another basic block for the exit. This doesn’t just increase compile times. It weakens any optimizations that become conservative around control flow. We found that this affected many things in LLVM, including instruction selection and dead store elimination.

B3 is designed to handle OSR without breaking basic blocks. We do this using the Check opcode. [...] The Check is not a basic block terminal. It encapsulates branching on a predicate, emitting an OSR exit handler for when the predicate is true, and falling through to the next instruction when the predicate is false. The generator passed to a Check emits code on an out-of-line path linked to from the Check’s branch. It’s not allowed to fall through or jump back into B3 code. B3 is allowed to assume that falling through a Check proves that the predicate was false.

We have found that Check is an essential optimization. It dramatically reduces the amount of control flow required to represent speculative operations. Like LLVM, B3 also has some optimizations that become conservative around actual control flow, but they handle Checks just fine.

I don't think you should switch away from LLVM entirely, but adding a dedicated try instruction should improve your lowering to LLVM bitcode (and even unlock new optimizations for the self-hosted backend).

andrewrk · 2022-06-02T23:11:15Z

Some quick stats on the ZIR part of this:

AstGen.zig Before:

# Source bytes:       430.3486328125KiB
# Tokens:             77147 (376.7177734375KiB)
# AST Nodes:          36978 (469.564453125KiB)
# Total ZIR bytes:    1.138463020324707MiB
# Instructions:       70328 (618.1171875KiB)
# String Table Bytes: 35.1611328125KiB
# Extra Data Items:   131186 (512.4453125KiB)

AstGen.zig After:

# Source bytes:       430.3486328125KiB
# Tokens:             77147 (376.7177734375KiB)
# AST Nodes:          36978 (469.564453125KiB)
# Total ZIR bytes:    1.076176643371582MiB
# Instructions:       65816 (578.4609375KiB)
# String Table Bytes: 35.1611328125KiB
# Extra Data Items:   125010 (488.3203125KiB)

That's a 5% reduction in total ZIR bytes from introducing the new try instruction.

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

Implements semantic analysis for the new try/try_inline ZIR instruction. Adds the new try/try_ptr AIR instructions and implements them for the LLVM backend. Fixes not calling rvalue() for tryExpr in AstGen. This is part of an effort to implement #11772.

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

Implements semantic analysis for the new try/try_inline ZIR instruction. Adds the new try/try_ptr AIR instructions and implements them for the LLVM backend. Fixes not calling rvalue() for tryExpr in AstGen. This is part of an effort to implement #11772.

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

Implements semantic analysis for the new try/try_inline ZIR instruction. Adds the new try/try_ptr AIR instructions and implements them for the LLVM backend. Fixes not calling rvalue() for tryExpr in AstGen. This is part of an effort to implement #11772.

andrewrk · 2022-06-06T23:46:11Z

Landed in d1bfc83.

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

Implements semantic analysis for the new try/try_inline ZIR instruction. Adds the new try/try_ptr AIR instructions and implements them for the LLVM backend. Fixes not calling rvalue() for tryExpr in AstGen. This is part of an effort to implement #11772.

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement ziglang#11772.

Implements semantic analysis for the new try/try_inline ZIR instruction. Adds the new try/try_ptr AIR instructions and implements them for the LLVM backend. Fixes not calling rvalue() for tryExpr in AstGen. This is part of an effort to implement ziglang#11772.

andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Jun 1, 2022

andrewrk added this to the 0.10.0 milestone Jun 1, 2022

andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Jun 1, 2022

andrewrk added the accepted This proposal is planned. label Jun 2, 2022

andrewrk added a commit that referenced this issue Jun 3, 2022

AstGen: introduce try instruction

33bc328

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

andrewrk mentioned this issue Jun 3, 2022

introduce a "try" ZIR and AIR instruction #11783

Merged

9 tasks

kubkon pushed a commit that referenced this issue Jun 5, 2022

AstGen: introduce try instruction

6a158ef

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

kubkon pushed a commit that referenced this issue Jun 5, 2022

AstGen: introduce try instruction

0224ad1

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

andrewrk closed this as completed Jun 6, 2022

andrewrk added a commit that referenced this issue Jul 19, 2022

AstGen: introduce try instruction

f8dec9a

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement #11772.

wooster0 pushed a commit to wooster0/zig that referenced this issue Jul 24, 2022

AstGen: introduce try instruction

823ceaa

This introduces two ZIR instructions: * `try` * `try_inline` This is part of an effort to implement ziglang#11772.

topolarity mentioned this issue Oct 20, 2022

stage2: Teach Liveness that some safety checks do not observably modify memory #13233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce a "try" ZIR and AIR instruction #11772

introduce a "try" ZIR and AIR instruction #11772

andrewrk commented Jun 1, 2022

kubkon commented Jun 1, 2022

Techcable commented Jun 1, 2022 •

edited

Loading

andrewrk commented Jun 2, 2022 •

edited

Loading

andrewrk commented Jun 6, 2022

introduce a "try" ZIR and AIR instruction #11772

introduce a "try" ZIR and AIR instruction #11772

Comments

andrewrk commented Jun 1, 2022

Motivation

Proposed Changes

ZIR Before

ZIR After

ZIR Explanation

Sema

AIR Before

AIR After

Conclusion

kubkon commented Jun 1, 2022

Techcable commented Jun 1, 2022 • edited Loading

andrewrk commented Jun 2, 2022 • edited Loading

AstGen.zig Before:

AstGen.zig After:

andrewrk commented Jun 6, 2022

Techcable commented Jun 1, 2022 •

edited

Loading

andrewrk commented Jun 2, 2022 •

edited

Loading