Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate simpler LLVM IR for shuffles that recursively become broadcasts #7902

Merged
merged 2 commits into from
Oct 14, 2023

Conversation

abadams
Copy link
Member

@abadams abadams commented Oct 13, 2023

LLVM 18 is causing us to fail the test performance_nested_vectorization_gemm on arm without arm_dot_prod as of a few days ago. The cause seems to be that we generate very complex chains of shuffles for the following piece of IR:

shuffle({some_u8x2}, {0, 1, 0, 1, 0, 1, 0, 1});

The current CodeGen_LLVM Shuffle handler reinterprets the u8x2 as a u16, and creates the following shuffle:

shuffle({reinterpret<u16>(some_u8x2)}, {0, 0, 0, 0});

and then recursively calls the shuffle codegen visitor. The shuffle codegen visitor doesn't have a special case for broadcasts, so it sees this as a degenerate self-interleave, and produces a complex binary tree of shuffles.

This PR instead detects broadcasts of a single lane of a single vector, and uses the existing broadcast handling.

@abadams
Copy link
Member Author

abadams commented Oct 14, 2023

Fixed the thing I was trying to fix. Failures are unrelated.

@abadams abadams requested a review from TH3CHARLie October 14, 2023 03:20
@abadams
Copy link
Member Author

abadams commented Oct 14, 2023

Remaining failures:

  • llvm ICE on fuzz_schedule test on wasm
  • llvm ICE on correctness_mul_div and correctness_logical on cuda
  • numpy was missing from the venv on mac-worker-3, I believe I have fixed it now

@abadams abadams merged commit 7e35494 into main Oct 14, 2023
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
…sts (halide#7902)

* Generate simpler LLVM IR for shuffles that recursively become broadcasts

* Don't re-codegen arg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants