Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2P] Support Wide Load/Store #307

Merged
merged 1 commit into from
Jan 28, 2025
Merged

Conversation

abnikant
Copy link
Collaborator

  1. Refactor 1024-bit FIFO Load/Store.
  2. Support 1024-bits and 2048-bits (accumulator) Load/Store.
  3. Add tests.

; CHECK-NEXT: [[COPY2:%[0-9]+]]:vec512 = COPY [[COPY1]].sub_512_lo
; CHECK-NEXT: VST_dmx_sts_x_idx_imm [[COPY2]], [[COPY]], 0 :: (store (<32 x s16>), align 128)
; CHECK-NEXT: [[COPY3:%[0-9]+]]:vec512 = COPY [[COPY1]].sub_512_hi
; CHECK-NEXT: VST_dmx_sts_x_idx_imm [[COPY3]], [[COPY]], 64 :: (store (<32 x s16>) into unknown-address + 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a clarification, into unknown-address what does it mean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the ir element which point to the address is removed from the test (G_LOAD/G_STORE) to make the test simple.

; CHECK-NEXT: VST_dmx_sts_bm_idx_imm [[COPY2]], [[COPY]], 0 :: (store (<16 x s32>), align 256)
; CHECK-NEXT: [[COPY3:%[0-9]+]]:acc512 = COPY [[COPY1]].sub_512_acc_hi
; CHECK-NEXT: VST_dmx_sts_bm_idx_imm [[COPY3]], [[COPY]], 64 :: (store (<16 x s32>) into unknown-address + 64)
; CHECK-NEXT: [[COPY4:%[0-9]+]]:acc512 = COPY [[COPY1]].sub_1024_acc_hi_then_sub_512_acc_lo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub_1024_acc_hi_then_sub_512_acc_lo is it legal? Does it reach till assembly generation?

; CHECK: liveins: $p0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:ep = COPY $p0
; CHECK-NEXT: [[VLDA_dmx_lda_x_idx_imm:%[0-9]+]]:vec512 = VLDA_dmx_lda_x_idx_imm [[COPY]], 0 :: (load (<32 x s16>), align 128)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understands, alignment for 512, 1024, 2048 vector load is 64, 128 and 256. Right?

@@ -103,6 +103,9 @@ class AIE2PInstructionSelector : public AIEBaseInstructionSelector {
bool select1024BitG_AIE_LOAD_STORE(MachineInstr &I, LoadStoreOpcodes &LSO,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this function has renamed, would you like to remove this?

for (unsigned i = 0; i < NumSplits; ++i) {
unsigned Offset = (SrcDstTySize == 2048 && i == 0) ? 128 : 0;
addSplitMemOperands(
AMI.MemI, Instrs[SplitFactor - 1 - 2 * i] /*Higher MIB*/,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this SplitFactor - 1 - 2 * i, I see for SplitFactor = 2, this can be even negative value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the loop is running for NumSplits - 1 times, and NumSplits = SplitFactor / 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be nice to have a comment about this lambda.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

RBI.constrainGenericRegister(SrcDstReg, *&AIE2P::FIFO1024RegClass,
MRI);
if (!RBI.constrainGenericRegister(
SrcDstReg, *(SplitFactor == 4 ? RC2048 : RC1024), MRI))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would suggest SrcDstTySize == 2048 for more readability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

MachineInstr &I, LoadStoreOpcodes &LSO, AddressingModeInfo &AMI,
MachineRegisterInfo &MRI) {
LLT SrcDstTy = MRI.getType(AMI.SrcDstOp.getReg());
unsigned SrcDstTySize = SrcDstTy.getSizeInBits();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assert for not misusing this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's there, check , we are handling only for supported RBID else it's unreachable.

@@ -2055,6 +2069,14 @@ LoadStoreOpcodes AIE2PInstructionSelector::getLoadStoreOpcode(
}
llvm_unreachable("1024-bit vector type must be in AccRegBank or "
"VRegBank or FifoRegBankID");
} else if (LoadStoreSize == 2048) {
assert(RBID == AIE2P::AccRegBankID &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are not handling if (PtrDef->getOpcode() == AIE2P::G_FRAME_INDEX) { case

}
if (RBID == AIE2P::VRegBankID) {
return {/*ISelOpcode=*/AIE2P::VLDA_dmx_lda_x_idx_imm,
/*FitsImmediateRange=*/true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: AlwaysFitsImmediateRange

@abnikant abnikant force-pushed the aie2p.wide.load.store branch 2 times, most recently from 024512c to 6aac9bb Compare January 24, 2025 07:29
int Offset = SubReg * 64;
auto Copy = MIB.buildInstr(TargetOpcode::COPY, {SubRegs[SubReg]}, {})
.addReg(AMI.SrcDstOp.getReg(), 0,
SubRegIdxes[SubReg % SubRegIdxes.size()]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following instead?

   .addReg(AMI.SrcDstOp.getReg(), 0, SubRegIdxes[SubRegIdx]);

Since SubRegIdx is always less than SplitFactor in the loop

@andcarminati
Copy link
Collaborator

Looking to the PR I see no handling for G_AIE_OFFSET_LOAD or other postinc variants of load and stores. Is there any plans to handle those cases?

@abnikant abnikant force-pushed the aie2p.wide.load.store branch from 6aac9bb to d0795e2 Compare January 24, 2025 09:14
@abnikant
Copy link
Collaborator Author

Looking to the PR I see no handling for G_AIE_OFFSET_LOAD or other postinc variants of load and stores. Is there any plans to handle those cases?

yes, I am working on it. Will open other PR to support it.

@abnikant
Copy link
Collaborator Author

@andcarminati , @khallouh @niwinanto , The comments are addressed . Please check if you have any more comments.

@abnikant abnikant force-pushed the aie2p.wide.load.store branch from d0795e2 to 6f4894a Compare January 27, 2025 05:53
addSplitMemOperands(
AMI.MemI, Instrs[SplitFactor - 1 - 2 * I] /*Higher MIB*/,
Instrs[SplitFactor - 2 - 2 * I] /*Lower MIB*/, Offset, SplitFactor);
}
Copy link
Collaborator

@martien-de-jong martien-de-jong Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

if (SrcDstType == 2048) {
  addSplitMemOperands()
  addSplitMemOperands()
} else {
  addSplitMemOperands()
}

and make the comment superfluous?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks for looking into it @martien-de-jong . Updated the code.

@abnikant abnikant force-pushed the aie2p.wide.load.store branch from 6f4894a to e06c996 Compare January 27, 2025 14:49
const unsigned Offset = SubRegIdx * 64;
auto Copy = MIB.buildInstr(TargetOpcode::COPY, {SubRegs[SubRegIdx]}, {})
.addReg(AMI.SrcDstOp.getReg(), 0,
SubRegIdxes[SubRegIdx % SubRegIdxes.size()]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. SubRegIdx is an index into the SubRegs array, not to be confused with an index into the SubRegsIdxes array.

@abnikant abnikant force-pushed the aie2p.wide.load.store branch from e06c996 to 52f324b Compare January 27, 2025 16:04
niwinanto
niwinanto previously approved these changes Jan 28, 2025
Copy link
Collaborator

@niwinanto niwinanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abnikant abnikant force-pushed the aie2p.wide.load.store branch from b93fa7b to e2be210 Compare January 28, 2025 06:48
Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abnikant abnikant merged commit ae07b4b into aie-public Jan 28, 2025
8 checks passed
@SagarMaheshwari99 SagarMaheshwari99 deleted the aie2p.wide.load.store branch February 3, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants