Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2P] Improve RegbankSelect handling for load/store offset and post-increment addressing modes #337

Open
wants to merge 1 commit into
base: aie-public
Choose a base branch
from

Conversation

abnikant
Copy link
Collaborator

@abnikant abnikant commented Feb 6, 2025

Improved register bank assignments (for accumulator and FIFO register banks) to support load/store post-increment, post-increment 2D, post-increment 3D, and offset addressing modes.


// Prepare the three mapping indices.
std::vector<AIEBaseRegisterBankInfo::PartialMappingIdx> MappingIndices = {
getPartialMappingIdx(Ty), getAccPartialMappingIdx(Ty),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use getVecPartialMappingIdx here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

OpdsMapping[Idx] = getValueMapping(OpRegBankIdx[Idx], OpSize[Idx]);
}
}
const InstructionMapping &VRegMapping = getInstructionMapping(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find the name VRegMapping to be a bit confusing here since this is supposed to be an alternative mapping

case AIE2P::G_AIE_POSTINC_3D_LOAD: {
const unsigned NumOperands = MI.getNumOperands();
const unsigned FirstSrcIdx = MI.getNumExplicitDefs();
unsigned MappingID = DefaultMappingID;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you start at DefaultMappingID instead of 1 for your alternative mapping IDs? DefaultMappingID is a UINT_MAX and you are incrementing it. It also mentions:

Identifier used when the related instruction mapping instance is generated by target independent code.
Make sure not to use that identifier to avoid possible collision.


// Prepare the three mapping indices.
std::vector<AIEBaseRegisterBankInfo::PartialMappingIdx> MappingIndices = {
getPartialMappingIdx(Ty), getAccPartialMappingIdx(Ty),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the loads, Can't we use getVecPartialMappingIdx instead of getPartialMappingIdx from the base which still relies on the element size?

}

if ((VecRegOpIdx + 1) < NumOperands)
OpRegBankIdx[VecRegOpIdx + 1] = PMI_PTR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply assert this (VecRegOpIdx + 1) < NumOperands instead and directly assign PMI_PTR? I think all of the store opcodes handled here should verify this condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, makes sense.

OpdsMapping[Idx] = getValueMapping(OpRegBankIdx[Idx], OpSize[Idx]);
}
}
const InstructionMapping &VRegMapping = getInstructionMapping(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the loads. I would use a different name since this is not always a vector bank mapping

@@ -705,7 +800,8 @@ bool AIE2PRegisterBankInfo::hasFifoInput(const MachineInstr &MI,
const Register RegOp) const {
auto *RI = static_cast<const AIEBaseRegisterInfo *>(
MI.getParent()->getParent()->getSubtarget().getRegisterInfo());
switch (MI.getOpcode()) {
const unsigned Op = MI.getOpcode();
switch (Op) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had addressed a comment on my PR to directly use switch (MI.getOpcode()) since Op for 'opcode' can be confusing as we also have a RegOp argument for 'operand' this time. Could you keep it and do the same for usesAccReg or at least change the name to avoid confusion with RegOp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

case TargetOpcode::G_STORE:
case AIE2P::G_AIE_OFFSET_STORE:
case AIE2P::G_AIE_POSTINC_STORE: {
const unsigned OpIdx = MI.getNumExplicitDefs();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 2D and 3D?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! missed updating the switch case, added now.

case TargetOpcode::G_STORE:
case AIE2P::G_AIE_OFFSET_STORE:
case AIE2P::G_AIE_POSTINC_STORE: {
const unsigned OpIdx = MI.getNumExplicitDefs();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 2D and 3D?

MachineMemOperand *MMO = *MI.memoperands_begin();
const unsigned MemSize = 8 * MMO->getSize().getValue();
// Handle the example case as below
/* %2:_(p0) = G_FRAME_INDEX %stack.0
G_STORE %1(<32 x s32>), %2(p0)
%4:_(<32 x s32>) = G_LOAD %2(p0) */
// Size of accumulator and fifo vectors on aie2p >= 512.
if (MemSize >= 512) {
Copy link
Collaborator

@khallouh khallouh Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remind me why the case above was handled according to the MemSize and why it should be greater than 512?
Edit: clarified

Copy link
Collaborator

@khallouh khallouh Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just saw the comment, this based on the size of the acc/fifo on the target. In that case, can we just return earlier for anything smaller than 512?

}

// Update mapping flags based on the refined candidate.
Copy link
Collaborator

@khallouh khallouh Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this just to address the case above, we can move it up inside the corresponding if case, or keep it here and remove it from above the if case. We don't have to update to do it twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Given an instruction that produces a pointer used in a load operation,
// return the candidate pointer register that should be traced
// for determining the register bank mapping.
const Register getUseCandidateReg(const MachineInstr &DefMI) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: getUseCandidatePointer


// Assign PMI_PTR to the next operand if within bounds
if ((VecRegOpIdx + 1) < NumOperands)
OpRegBankIdx[VecRegOpIdx + 1] = PMI_PTR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, can we simply assert here instead of the if?

@abnikant abnikant force-pushed the aie2p.postinc.offset.ld.st.regbank branch 4 times, most recently from 635c08d to 614a302 Compare February 6, 2025 11:53
@abnikant abnikant force-pushed the aie2p.postinc.offset.ld.st.regbank branch 2 times, most recently from 09958b9 to 54ead3e Compare February 10, 2025 06:25
; FAST-NEXT: $lfl0 = COPY [[AIE_OFFSET_LOAD]](<8 x s64>)
%1:_(p0) = COPY $p0
%2:_(s20) = G_CONSTANT i20 448
%0:_(<8 x s64>) = G_AIE_OFFSET_LOAD %1:_(p0), %2:_(s20) :: (load (<8 x s64>))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for G_AIE_OFFSET_STORE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests.

UseCandidate = DefMI->getOperand(0).getReg();
Register PtrReg = MI.getOperand(FirstSrcIdx).getReg();
MachineInstr *DefMI = MRI.getVRegDef(PtrReg);
if (DefMI) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that guaranteed by SSA?

Copy link
Collaborator

@khallouh khallouh Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which seems to make the following block above kind of dead code as we would update the use candidate every time

    if (isUseAccInsn(MRI, TRI, UseCandidate))
      isAccRegMapping = true;
    if (isUseFifoInsn(MRI, TRI, UseCandidate))
      isFifoPhysRegMapping = true;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok so we would update the use candidate every time but we still need the first block for when the the load feeds acc/fifo instructions directly and not through the pointer.

// for MemSize < 512, fallback to base instruction mapping.
if (MemSize < 512)
return AIEBaseRegisterBankInfo::getInstrMapping(MI);

// Handle the example case as below
/* %2:_(p0) = G_FRAME_INDEX %stack.0
G_STORE %1(<32 x s32>), %2(p0)
%4:_(<32 x s32>) = G_LOAD %2(p0) */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to handle this case and update the use candidate. Here is the code after instruction-select for test_v16x32_spill_reload without this change:

  bb.0.entry:
    %0:fifo512 = COPY $lfl1
    VST_dmx_sts_fifohl_spill %0, %stack.0, implicit $sp :: (store (<16 x s32>))
    %3:vec512 = VLDA_dmx_lda_x_spill %stack.0, implicit $sp :: (dereferenceable load (<16 x s32>))
    VST_dmx_sts_x_spill %3, %stack.1, implicit $sp :: (volatile store (<16 x s32>))
    %4:fifo512 = VLDA_dmx_lda_fifohl_spill %stack.1, implicit $sp :: (volatile dereferenceable load (<16 x s32>))
    $lfl0 = COPY %4
    PseudoRET implicit $lr, implicit $lfl0

We simply use a vector load and store instead of fifo load and store, no cross bank copies are introduced.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I have removed this special case. I don't find a case where it will help in better register bank assignment. We can always add if there is a need.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understands we are handling this

    %1:_(<16 x s32>) = COPY $lfl1
    %2:_(p0) = G_FRAME_INDEX %stack.0
    G_STORE %1(<16 x s32>), %2(p0) :: (store (<16 x s32>))
    %4:_(<16 x s32>) = G_LOAD %2(p0) :: (dereferenceable load (<16 x s32>))
..
..
uses %4

Only issue I can see is, if we want to select uses %4 instruction based on %4 we might hit vector variant of instruction always and need cross bank copy always if there are other variants we can select.

Also, if you would like to remove this special case you can remove AIE2PRegisterBankInfo::hasFifoInput and corresponding accumulator function. It seems dead code.

  case TargetOpcode::G_STORE:
  case AIE2P::G_AIE_OFFSET_STORE:
  case AIE2P::G_AIE_POSTINC_STORE:
  case AIE2P::G_AIE_POSTINC_2D_STORE:
  case AIE2P::G_AIE_POSTINC_3D_STORE: {
    const unsigned OpIdx = MI.getNumExplicitDefs();
    auto *RB = getRegBank(MI.getOperand(OpIdx).getReg(), MRI, TRI);
    if (RB == &AIE2P::FifoRegBank)
      return true;
    break;
  }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example usecase

    ; FAST-LABEL: name: test_v16acc32_spill_reload
    ; FAST: [[COPY:%[0-9]+]]:accregbank(<16 x s32>) = COPY $bmll1
    ; FAST-NEXT: [[COPY1:%[0-9]+]]:gprregbank(s32) = COPY $r0
    ; FAST-NEXT: [[FRAME_INDEX:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.0
    ; FAST-NEXT: G_STORE [[COPY]](<16 x s32>), [[FRAME_INDEX]](p0) :: (store (<16 x s32>))
    ; FAST-NEXT: [[LOAD:%[0-9]+]]:vregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>))
    ; FAST-NEXT: [[ADD:%[0-9]+]]:vregbank(<16 x s32>) = G_ADD [[LOAD]], [[LOAD]]
    ; FAST-NEXT: [[COPY2:%[0-9]+]]:accregbank(<16 x s32>) = COPY [[ADD]](<16 x s32>)
    ; FAST-NEXT: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), [[COPY2]](<16 x s32>), [[COPY1]](s32)
    ; FAST-NEXT: PseudoRET implicit $lr
    %1:_(<16 x s32>) = COPY $bmll1
    %3:gprregbank(s32) = COPY $r0
    %2:_(p0) = G_FRAME_INDEX %stack.0
    G_STORE %1(<16 x s32>), %2(p0) :: (store (<16 x s32>))
    %4:_(<16 x s32>) = G_LOAD %2(p0) :: (dereferenceable load (<16 x s32>))
    %5:_(<16 x s32>) = G_ADD %4, %4
    G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), %5(<16 x s32>), %3(s32)
    PseudoRET implicit $lr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example usecase

    ; FAST-LABEL: name: test_v16acc32_spill_reload
    ; FAST: [[COPY:%[0-9]+]]:accregbank(<16 x s32>) = COPY $bmll1
    ; FAST-NEXT: [[COPY1:%[0-9]+]]:gprregbank(s32) = COPY $r0
    ; FAST-NEXT: [[FRAME_INDEX:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.0
    ; FAST-NEXT: G_STORE [[COPY]](<16 x s32>), [[FRAME_INDEX]](p0) :: (store (<16 x s32>))
    ; FAST-NEXT: [[LOAD:%[0-9]+]]:vregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>))
    ; FAST-NEXT: [[ADD:%[0-9]+]]:vregbank(<16 x s32>) = G_ADD [[LOAD]], [[LOAD]]
    ; FAST-NEXT: [[COPY2:%[0-9]+]]:accregbank(<16 x s32>) = COPY [[ADD]](<16 x s32>)
    ; FAST-NEXT: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), [[COPY2]](<16 x s32>), [[COPY1]](s32)
    ; FAST-NEXT: PseudoRET implicit $lr
    %1:_(<16 x s32>) = COPY $bmll1
    %3:gprregbank(s32) = COPY $r0
    %2:_(p0) = G_FRAME_INDEX %stack.0
    G_STORE %1(<16 x s32>), %2(p0) :: (store (<16 x s32>))
    %4:_(<16 x s32>) = G_LOAD %2(p0) :: (dereferenceable load (<16 x s32>))
    %5:_(<16 x s32>) = G_ADD %4, %4
    G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), %5(<16 x s32>), %3(s32)
    PseudoRET implicit $lr

yes, we have some cases in mllib too, I noticed that while running mllib. I will bring back this code. I am already working on to refine this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could also include a reduced test case from mllib @abnikant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't reproduce with the mllib tests I tried. I think we can drop it now and take it later if there is a need?

@@ -251,32 +251,125 @@ AIE2PRegisterBankInfo::getInstrAlternativeMappings(
const MachineRegisterInfo &MRI = MF.getRegInfo();
switch (MI.getOpcode()) {
case TargetOpcode::G_LOAD:
case TargetOpcode::G_STORE: {
case AIE2P::G_AIE_OFFSET_LOAD:
case AIE2P::G_AIE_POSTINC_LOAD:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about G_AIE_OFFSET_ZEXTLOAD and G_AIE_OFFSET_SEXTLOAD?

case TargetOpcode::G_LOAD: {
case TargetOpcode::G_LOAD:
case AIE2P::G_AIE_OFFSET_LOAD:
case AIE2P::G_AIE_POSTINC_LOAD:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about G_AIE_OFFSET_ZEXTLOAD and G_AIE_OFFSET_SEXTLOAD?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are falling back to Base Implementation for Size < 512.

@@ -229,6 +229,13 @@ unsigned AIE2InstrInfo::getOffsetMemOpcode(unsigned BaseMemOpcode) const {
llvm_unreachable("not a generic load/store");
}

bool AIE2InstrInfo::isGenericOffsetMemOpcode(unsigned Opcode) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that all G_AIE* are target specific. I feel that Generic (besides of G) can be a bit confusing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other PR, we need to update other hooks as well to remove 'Generic'. Possibly we can do it in other/separate PR?

Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you add the regbankselect support for AIE's load and store opcodes in general. It would be nice to test the other assignments not just fifo512 and acc512.
For scalars offset and postinc, there is these 2 tests for AIE2 which you can extend for AIE2P
regbankselect-offset-postinc-store.mir and regbankselect-offset-postinc-load.mir. I couldn't find any for vectors or 2D/3D

@khallouh
Copy link
Collaborator

khallouh commented Feb 10, 2025

Note: all my review comments are concerning the 2nd commit regarding RegBankSelect. For the 1st commit (enabling combines), there is another PR #323

// Select the operand index for VecReg.
const unsigned VecRegOpIdx = MI.getNumExplicitDefs();
Register VecReg = MI.getOperand(VecRegOpIdx).getReg();
if (!VecReg)
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return the base implementation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we return earlier for any type smaller than 512 like we do for the loads? since we are trying to assign either the accumulator or fifo bank.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, now we return early for Size < 512.

@abnikant abnikant force-pushed the aie2p.postinc.offset.ld.st.regbank branch from 54ead3e to e9dcf85 Compare February 11, 2025 11:24
@abnikant
Copy link
Collaborator Author

Since you add the regbankselect support for AIE's load and store opcodes in general. It would be nice to test the other assignments not just fifo512 and acc512. For scalars offset and postinc, there is these 2 tests for AIE2 which you can extend for AIE2P regbankselect-offset-postinc-store.mir and regbankselect-offset-postinc-load.mir. I couldn't find any for vectors or 2D/3D

enabled these for aie2p and also added tests for vector. 2D/3D tests were already added for accumulator and fifo registers.

@abnikant abnikant force-pushed the aie2p.postinc.offset.ld.st.regbank branch from e9dcf85 to f75e1d3 Compare February 11, 2025 17:33
@@ -205,7 +205,7 @@ body: |
; FAST-NEXT: [[FRAME_INDEX:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.0
; FAST-NEXT: [[FRAME_INDEX1:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.1
; FAST-NEXT: G_STORE [[COPY]](<16 x s32>), [[FRAME_INDEX]](p0) :: (store (<16 x s32>))
; FAST-NEXT: [[LOAD:%[0-9]+]]:accregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>))
; FAST-NEXT: [[LOAD:%[0-9]+]]:vregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we regressing here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment #337 (comment)

// Base Alternative Mapping for size <= 256.
if (Size <= 256)
break;
const unsigned Size = getSizeInBits(DstReg, MRI, TRI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: DstSize

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DstRegTy

AltMappings.push_back(&AccRegMapping);
Register VecReg = MI.getOperand(VecRegOpIdx).getReg();
const LLT Ty = MRI.getType(VecReg);
const unsigned Size = getSizeInBits(VecReg, MRI, TRI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: VecRegTy and VecRegSize

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// If the register already belongs to a known bank, just return.
if (getRegBank(UseCandidate, MRI, TRI) == &AIE2P::AccRegBank ||
getRegBank(UseCandidate, MRI, TRI) == &AIE2P::FifoRegBank)
return AIEBaseRegisterBankInfo::getInstrMapping(MI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also return if we know it is in the vector bank?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Register VecReg = MI.getOperand(VecRegOpIdx).getReg();
// Size of accumulator and fifo vectors on aie2p >= 512.
// for VecSize < 512, fallback to base instruction mapping.
if (!VecReg || MRI.getType(VecReg).getSizeInBits() < 512)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to check if(!VecReg)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed. dropped.

auto *DefRegBank = getRegBank(VecReg, MRI, TRI);
// Only process if VecReg is defined in Accumulator or Fifo register banks.
if (DefRegBank != &AIE2P::AccRegBank && DefRegBank != &AIE2P::FifoRegBank)
return AIEBaseRegisterBankInfo::getInstrMapping(MI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not process the vector bank ones as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do the analysis to assign acc and fifo regbanks.

@@ -255,32 +255,125 @@ AIE2PRegisterBankInfo::getInstrAlternativeMappings(
const MachineRegisterInfo &MRI = MF.getRegInfo();
switch (MI.getOpcode()) {
case TargetOpcode::G_LOAD:
case TargetOpcode::G_STORE: {
case AIE2P::G_AIE_OFFSET_LOAD:
Copy link
Collaborator

@niwinanto niwinanto Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really care about the alternative mapping? Since the cost is same, I think we always stick to the instruction mapping. May be you can check with --check-prefix=COMMON,GREEDY in the test. I believe both fast and greedy produces the same result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we do need in many cases, and also cost might change later.

Copy link
Collaborator

@niwinanto niwinanto Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the tests removing --check-prefix=GREEDY and --check-prefix=GREEDY , I do not see any tests using alternative mappings.

case AIE2P::G_AIE_POSTINC_2D_STORE:
case AIE2P::G_AIE_POSTINC_3D_STORE: {
const unsigned OpIdx = MI.getNumExplicitDefs();
auto *RB = getRegBank(MI.getOperand(OpIdx).getReg(), MRI, TRI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure OpIdx is the right value. As I understand, we would like to check whether the vector to be store is a FifoReg. Now we look for the pointer and it will always return false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it is a pointer operand ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Idx should point to the first source operand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I got confused between LOAD.

// For 2048-bit sizes, mapping is restricted to the accumulator bank.
// Also, if the instruction has any implicit defs/uses, leave it alone.
if (Size <= 256 || Size == 2048 ||
MI.getNumOperands() != MI.getNumExplicitOperands())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any valid case of MI.getNumOperands() != MI.getNumExplicitOperands() considering loads?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not, needed. I dropped this check.

}

// Assign PMI_PTR to the next operand if within bounds
assert(((VecRegOpIdx + 1) < NumOperands) && "RegOpIdx + 1 out of bound");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: VecRegOpIdx + 1 out of bounds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@abnikant abnikant force-pushed the aie2p.postinc.offset.ld.st.regbank branch from f75e1d3 to 6e26d97 Compare February 14, 2025 15:50
@abnikant
Copy link
Collaborator Author

Also, I ran mllib 3 days back, there was no extra failure. I have fired the run again, will post the result once done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants