-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Prefer vrgatherei16 for shuffles #66291
Conversation
If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers *can* overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.
@llvm/pr-subscribers-backend-risc-v ChangesIf the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers *can* overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.Reviewers, this is a resurrection of something I started a few months back. I just stumbled across the local branch again. I vaguely remember there being a problem with constant materialization interaction which appears resolved, but I'm a bit worried I've forgotten something here. Careful consideration appreciated.Patch is 139.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66291.diff 11 Files Affected:
<pre>
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll define <4 x double> @vrgather_shuffle_vv_v4f64(<4 x double> %x, <4 x double> %y) { define <4 x double> @vrgather_shuffle_xv_v4f64(<4 x double> %x) { |
// If the mask allows, we can do all the index computation in 16 bits. This | ||
// requires less work and less register pressure at high LMUL, and creates | ||
// smaller constants which may be cheaper to materialize. | ||
if (IndexVT.getScalarType().bitsGT(MVT::i16) && isUInt<16>(NumElts * 2) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why NumElts * 2
? Shouldn't it be NumElts - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, and I will adjust. In practice, I'm not too worried about vectors with 32k elements. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't think MVT goes that high yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers *can* overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.
Pushed as c663401. |
If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers *can* overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.
If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers *can* overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.
If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers *can* overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.
If the data type is larger than e16, and the requires more than LMUL1 register class, prefer the use of vrgatherei16. This has three major benefits: 1) Less work needed to evaluate the constant for e.g. vid sequences. Remember that arithmetic generally scales lineary with LMUL. 2) Less register pressure. In particular, the source and indices registers can overlap so using a smaller index can significantly help at m8. 3) Smaller constants. We've got a bunch of tricks for materializing small constants, and if needed, can use a EEW=16 load.
Reviewers, this is a resurrection of something I started a few months back. I just stumbled across the local branch again. I vaguely remember there being a problem with constant materialization interaction which appears resolved, but I'm a bit worried I've forgotten something here. Careful consideration appreciated.