-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Support Wide Load/Store #307
Conversation
abnikant
commented
Jan 23, 2025
- Refactor 1024-bit FIFO Load/Store.
- Support 1024-bits and 2048-bits (accumulator) Load/Store.
- Add tests.
; CHECK-NEXT: [[COPY2:%[0-9]+]]:vec512 = COPY [[COPY1]].sub_512_lo | ||
; CHECK-NEXT: VST_dmx_sts_x_idx_imm [[COPY2]], [[COPY]], 0 :: (store (<32 x s16>), align 128) | ||
; CHECK-NEXT: [[COPY3:%[0-9]+]]:vec512 = COPY [[COPY1]].sub_512_hi | ||
; CHECK-NEXT: VST_dmx_sts_x_idx_imm [[COPY3]], [[COPY]], 64 :: (store (<32 x s16>) into unknown-address + 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a clarification, into unknown-address
what does it mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the ir element which point to the address is removed from the test (G_LOAD/G_STORE) to make the test simple.
; CHECK-NEXT: VST_dmx_sts_bm_idx_imm [[COPY2]], [[COPY]], 0 :: (store (<16 x s32>), align 256) | ||
; CHECK-NEXT: [[COPY3:%[0-9]+]]:acc512 = COPY [[COPY1]].sub_512_acc_hi | ||
; CHECK-NEXT: VST_dmx_sts_bm_idx_imm [[COPY3]], [[COPY]], 64 :: (store (<16 x s32>) into unknown-address + 64) | ||
; CHECK-NEXT: [[COPY4:%[0-9]+]]:acc512 = COPY [[COPY1]].sub_1024_acc_hi_then_sub_512_acc_lo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub_1024_acc_hi_then_sub_512_acc_lo
is it legal? Does it reach till assembly generation?
; CHECK: liveins: $p0 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: [[COPY:%[0-9]+]]:ep = COPY $p0 | ||
; CHECK-NEXT: [[VLDA_dmx_lda_x_idx_imm:%[0-9]+]]:vec512 = VLDA_dmx_lda_x_idx_imm [[COPY]], 0 :: (load (<32 x s16>), align 128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understands, alignment for 512, 1024, 2048 vector load is 64, 128 and 256. Right?
@@ -103,6 +103,9 @@ class AIE2PInstructionSelector : public AIEBaseInstructionSelector { | |||
bool select1024BitG_AIE_LOAD_STORE(MachineInstr &I, LoadStoreOpcodes &LSO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this function has renamed, would you like to remove this?
for (unsigned i = 0; i < NumSplits; ++i) { | ||
unsigned Offset = (SrcDstTySize == 2048 && i == 0) ? 128 : 0; | ||
addSplitMemOperands( | ||
AMI.MemI, Instrs[SplitFactor - 1 - 2 * i] /*Higher MIB*/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this SplitFactor - 1 - 2 * i
, I see for SplitFactor = 2
, this can be even negative value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the loop is running for NumSplits - 1 times, and NumSplits = SplitFactor / 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to have a comment about this lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
RBI.constrainGenericRegister(SrcDstReg, *&AIE2P::FIFO1024RegClass, | ||
MRI); | ||
if (!RBI.constrainGenericRegister( | ||
SrcDstReg, *(SplitFactor == 4 ? RC2048 : RC1024), MRI)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would suggest SrcDstTySize == 2048
for more readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
MachineInstr &I, LoadStoreOpcodes &LSO, AddressingModeInfo &AMI, | ||
MachineRegisterInfo &MRI) { | ||
LLT SrcDstTy = MRI.getType(AMI.SrcDstOp.getReg()); | ||
unsigned SrcDstTySize = SrcDstTy.getSizeInBits(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an assert for not misusing this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's there, check , we are handling only for supported RBID else it's unreachable.
@@ -2055,6 +2069,14 @@ LoadStoreOpcodes AIE2PInstructionSelector::getLoadStoreOpcode( | |||
} | |||
llvm_unreachable("1024-bit vector type must be in AccRegBank or " | |||
"VRegBank or FifoRegBankID"); | |||
} else if (LoadStoreSize == 2048) { | |||
assert(RBID == AIE2P::AccRegBankID && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are not handling if (PtrDef->getOpcode() == AIE2P::G_FRAME_INDEX) {
case
} | ||
if (RBID == AIE2P::VRegBankID) { | ||
return {/*ISelOpcode=*/AIE2P::VLDA_dmx_lda_x_idx_imm, | ||
/*FitsImmediateRange=*/true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: AlwaysFitsImmediateRange
024512c
to
6aac9bb
Compare
int Offset = SubReg * 64; | ||
auto Copy = MIB.buildInstr(TargetOpcode::COPY, {SubRegs[SubReg]}, {}) | ||
.addReg(AMI.SrcDstOp.getReg(), 0, | ||
SubRegIdxes[SubReg % SubRegIdxes.size()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the following instead?
.addReg(AMI.SrcDstOp.getReg(), 0, SubRegIdxes[SubRegIdx]);
Since SubRegIdx
is always less than SplitFactor
in the loop
Looking to the PR I see no handling for |
6aac9bb
to
d0795e2
Compare
yes, I am working on it. Will open other PR to support it. |
@andcarminati , @khallouh @niwinanto , The comments are addressed . Please check if you have any more comments. |
d0795e2
to
6f4894a
Compare
addSplitMemOperands( | ||
AMI.MemI, Instrs[SplitFactor - 1 - 2 * I] /*Higher MIB*/, | ||
Instrs[SplitFactor - 2 - 2 * I] /*Lower MIB*/, Offset, SplitFactor); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not
if (SrcDstType == 2048) {
addSplitMemOperands()
addSplitMemOperands()
} else {
addSplitMemOperands()
}
and make the comment superfluous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks for looking into it @martien-de-jong . Updated the code.
6f4894a
to
e06c996
Compare
const unsigned Offset = SubRegIdx * 64; | ||
auto Copy = MIB.buildInstr(TargetOpcode::COPY, {SubRegs[SubRegIdx]}, {}) | ||
.addReg(AMI.SrcDstOp.getReg(), 0, | ||
SubRegIdxes[SubRegIdx % SubRegIdxes.size()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. SubRegIdx is an index into the SubRegs array, not to be confused with an index into the SubRegsIdxes array.
e06c996
to
52f324b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
52f324b
to
b93fa7b
Compare
b93fa7b
to
e2be210
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM