Generate simpler LLVM IR for shuffles that recursively become broadcasts #7902

abadams · 2023-10-13T23:42:50Z

LLVM 18 is causing us to fail the test performance_nested_vectorization_gemm on arm without arm_dot_prod as of a few days ago. The cause seems to be that we generate very complex chains of shuffles for the following piece of IR:

shuffle({some_u8x2}, {0, 1, 0, 1, 0, 1, 0, 1});

The current CodeGen_LLVM Shuffle handler reinterprets the u8x2 as a u16, and creates the following shuffle:

shuffle({reinterpret<u16>(some_u8x2)}, {0, 0, 0, 0});

and then recursively calls the shuffle codegen visitor. The shuffle codegen visitor doesn't have a special case for broadcasts, so it sees this as a degenerate self-interleave, and produces a complex binary tree of shuffles.

This PR instead detects broadcasts of a single lane of a single vector, and uses the existing broadcast handling.

abadams · 2023-10-14T03:20:03Z

Fixed the thing I was trying to fix. Failures are unrelated.

abadams · 2023-10-14T03:24:15Z

Remaining failures:

llvm ICE on fuzz_schedule test on wasm
llvm ICE on correctness_mul_div and correctness_logical on cuda
numpy was missing from the venv on mac-worker-3, I believe I have fixed it now

…sts (halide#7902) * Generate simpler LLVM IR for shuffles that recursively become broadcasts * Don't re-codegen arg

abadams added 2 commits October 13, 2023 16:31

Generate simpler LLVM IR for shuffles that recursively become broadcasts

a09bcdf

Don't re-codegen arg

a7d1b8c

abadams requested a review from TH3CHARLie October 14, 2023 03:20

TH3CHARLie approved these changes Oct 14, 2023

View reviewed changes

abadams merged commit 7e35494 into main Oct 14, 2023

BrewTestBot mentioned this pull request Feb 2, 2024

halide 17.0.0 Homebrew/homebrew-core#161602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate simpler LLVM IR for shuffles that recursively become broadcasts #7902

Generate simpler LLVM IR for shuffles that recursively become broadcasts #7902

abadams commented Oct 13, 2023

abadams commented Oct 14, 2023

abadams commented Oct 14, 2023

Generate simpler LLVM IR for shuffles that recursively become broadcasts #7902

Generate simpler LLVM IR for shuffles that recursively become broadcasts #7902

Conversation

abadams commented Oct 13, 2023

abadams commented Oct 14, 2023

abadams commented Oct 14, 2023