-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Improve RegbankSelect
handling for load/store offset and post-increment addressing modes
#337
base: aie-public
Are you sure you want to change the base?
Conversation
123a229
to
57ae156
Compare
|
||
// Prepare the three mapping indices. | ||
std::vector<AIEBaseRegisterBankInfo::PartialMappingIdx> MappingIndices = { | ||
getPartialMappingIdx(Ty), getAccPartialMappingIdx(Ty), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use getVecPartialMappingIdx
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
OpdsMapping[Idx] = getValueMapping(OpRegBankIdx[Idx], OpSize[Idx]); | ||
} | ||
} | ||
const InstructionMapping &VRegMapping = getInstructionMapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I find the name VRegMapping
to be a bit confusing here since this is supposed to be an alternative mapping
case AIE2P::G_AIE_POSTINC_3D_LOAD: { | ||
const unsigned NumOperands = MI.getNumOperands(); | ||
const unsigned FirstSrcIdx = MI.getNumExplicitDefs(); | ||
unsigned MappingID = DefaultMappingID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you start at DefaultMappingID
instead of 1 for your alternative mapping IDs? DefaultMappingID is a UINT_MAX and you are incrementing it. It also mentions:
Identifier used when the related instruction mapping instance is generated by target independent code.
Make sure not to use that identifier to avoid possible collision.
|
||
// Prepare the three mapping indices. | ||
std::vector<AIEBaseRegisterBankInfo::PartialMappingIdx> MappingIndices = { | ||
getPartialMappingIdx(Ty), getAccPartialMappingIdx(Ty), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the loads, Can't we use getVecPartialMappingIdx
instead of getPartialMappingIdx
from the base which still relies on the element size?
} | ||
|
||
if ((VecRegOpIdx + 1) < NumOperands) | ||
OpRegBankIdx[VecRegOpIdx + 1] = PMI_PTR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply assert this (VecRegOpIdx + 1) < NumOperands
instead and directly assign PMI_PTR
? I think all of the store opcodes handled here should verify this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, makes sense.
OpdsMapping[Idx] = getValueMapping(OpRegBankIdx[Idx], OpSize[Idx]); | ||
} | ||
} | ||
const InstructionMapping &VRegMapping = getInstructionMapping( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for the loads. I would use a different name since this is not always a vector bank mapping
@@ -705,7 +800,8 @@ bool AIE2PRegisterBankInfo::hasFifoInput(const MachineInstr &MI, | |||
const Register RegOp) const { | |||
auto *RI = static_cast<const AIEBaseRegisterInfo *>( | |||
MI.getParent()->getParent()->getSubtarget().getRegisterInfo()); | |||
switch (MI.getOpcode()) { | |||
const unsigned Op = MI.getOpcode(); | |||
switch (Op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had addressed a comment on my PR to directly use switch (MI.getOpcode())
since Op
for 'opcode' can be confusing as we also have a RegOp
argument for 'operand' this time. Could you keep it and do the same for usesAccReg
or at least change the name to avoid confusion with RegOp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
case TargetOpcode::G_STORE: | ||
case AIE2P::G_AIE_OFFSET_STORE: | ||
case AIE2P::G_AIE_POSTINC_STORE: { | ||
const unsigned OpIdx = MI.getNumExplicitDefs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about 2D and 3D?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! missed updating the switch case, added now.
case TargetOpcode::G_STORE: | ||
case AIE2P::G_AIE_OFFSET_STORE: | ||
case AIE2P::G_AIE_POSTINC_STORE: { | ||
const unsigned OpIdx = MI.getNumExplicitDefs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about 2D and 3D?
MachineMemOperand *MMO = *MI.memoperands_begin(); | ||
const unsigned MemSize = 8 * MMO->getSize().getValue(); | ||
// Handle the example case as below | ||
/* %2:_(p0) = G_FRAME_INDEX %stack.0 | ||
G_STORE %1(<32 x s32>), %2(p0) | ||
%4:_(<32 x s32>) = G_LOAD %2(p0) */ | ||
// Size of accumulator and fifo vectors on aie2p >= 512. | ||
if (MemSize >= 512) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remind me why the case above was handled according to the MemSize
and why it should be greater than 512?
Edit: clarified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just saw the comment, this based on the size of the acc/fifo on the target. In that case, can we just return earlier for anything smaller than 512?
} | ||
|
||
// Update mapping flags based on the refined candidate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if this just to address the case above, we can move it up inside the corresponding if case, or keep it here and remove it from above the if case. We don't have to update to do it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Given an instruction that produces a pointer used in a load operation, | ||
// return the candidate pointer register that should be traced | ||
// for determining the register bank mapping. | ||
const Register getUseCandidateReg(const MachineInstr &DefMI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: getUseCandidatePointer
|
||
// Assign PMI_PTR to the next operand if within bounds | ||
if ((VecRegOpIdx + 1) < NumOperands) | ||
OpRegBankIdx[VecRegOpIdx + 1] = PMI_PTR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, can we simply assert here instead of the if
?
635c08d
to
614a302
Compare
09958b9
to
54ead3e
Compare
; FAST-NEXT: $lfl0 = COPY [[AIE_OFFSET_LOAD]](<8 x s64>) | ||
%1:_(p0) = COPY $p0 | ||
%2:_(s20) = G_CONSTANT i20 448 | ||
%0:_(<8 x s64>) = G_AIE_OFFSET_LOAD %1:_(p0), %2:_(s20) :: (load (<8 x s64>)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for G_AIE_OFFSET_STORE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added tests.
UseCandidate = DefMI->getOperand(0).getReg(); | ||
Register PtrReg = MI.getOperand(FirstSrcIdx).getReg(); | ||
MachineInstr *DefMI = MRI.getVRegDef(PtrReg); | ||
if (DefMI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that guaranteed by SSA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which seems to make the following block above kind of dead code as we would update the use candidate every time
if (isUseAccInsn(MRI, TRI, UseCandidate))
isAccRegMapping = true;
if (isUseFifoInsn(MRI, TRI, UseCandidate))
isFifoPhysRegMapping = true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok so we would update the use candidate every time but we still need the first block for when the the load feeds acc/fifo instructions directly and not through the pointer.
// for MemSize < 512, fallback to base instruction mapping. | ||
if (MemSize < 512) | ||
return AIEBaseRegisterBankInfo::getInstrMapping(MI); | ||
|
||
// Handle the example case as below | ||
/* %2:_(p0) = G_FRAME_INDEX %stack.0 | ||
G_STORE %1(<32 x s32>), %2(p0) | ||
%4:_(<32 x s32>) = G_LOAD %2(p0) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need to handle this case and update the use candidate. Here is the code after instruction-select for test_v16x32_spill_reload
without this change:
bb.0.entry:
%0:fifo512 = COPY $lfl1
VST_dmx_sts_fifohl_spill %0, %stack.0, implicit $sp :: (store (<16 x s32>))
%3:vec512 = VLDA_dmx_lda_x_spill %stack.0, implicit $sp :: (dereferenceable load (<16 x s32>))
VST_dmx_sts_x_spill %3, %stack.1, implicit $sp :: (volatile store (<16 x s32>))
%4:fifo512 = VLDA_dmx_lda_fifohl_spill %stack.1, implicit $sp :: (volatile dereferenceable load (<16 x s32>))
$lfl0 = COPY %4
PseudoRET implicit $lr, implicit $lfl0
We simply use a vector load and store instead of fifo load and store, no cross bank copies are introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I have removed this special case. I don't find a case where it will help in better register bank assignment. We can always add if there is a need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understands we are handling this
%1:_(<16 x s32>) = COPY $lfl1
%2:_(p0) = G_FRAME_INDEX %stack.0
G_STORE %1(<16 x s32>), %2(p0) :: (store (<16 x s32>))
%4:_(<16 x s32>) = G_LOAD %2(p0) :: (dereferenceable load (<16 x s32>))
..
..
uses %4
Only issue I can see is, if we want to select uses %4
instruction based on %4
we might hit vector variant of instruction always and need cross bank copy always if there are other variants we can select.
Also, if you would like to remove this special case you can remove AIE2PRegisterBankInfo::hasFifoInput
and corresponding accumulator function. It seems dead code.
case TargetOpcode::G_STORE:
case AIE2P::G_AIE_OFFSET_STORE:
case AIE2P::G_AIE_POSTINC_STORE:
case AIE2P::G_AIE_POSTINC_2D_STORE:
case AIE2P::G_AIE_POSTINC_3D_STORE: {
const unsigned OpIdx = MI.getNumExplicitDefs();
auto *RB = getRegBank(MI.getOperand(OpIdx).getReg(), MRI, TRI);
if (RB == &AIE2P::FifoRegBank)
return true;
break;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example usecase
; FAST-LABEL: name: test_v16acc32_spill_reload
; FAST: [[COPY:%[0-9]+]]:accregbank(<16 x s32>) = COPY $bmll1
; FAST-NEXT: [[COPY1:%[0-9]+]]:gprregbank(s32) = COPY $r0
; FAST-NEXT: [[FRAME_INDEX:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.0
; FAST-NEXT: G_STORE [[COPY]](<16 x s32>), [[FRAME_INDEX]](p0) :: (store (<16 x s32>))
; FAST-NEXT: [[LOAD:%[0-9]+]]:vregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>))
; FAST-NEXT: [[ADD:%[0-9]+]]:vregbank(<16 x s32>) = G_ADD [[LOAD]], [[LOAD]]
; FAST-NEXT: [[COPY2:%[0-9]+]]:accregbank(<16 x s32>) = COPY [[ADD]](<16 x s32>)
; FAST-NEXT: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), [[COPY2]](<16 x s32>), [[COPY1]](s32)
; FAST-NEXT: PseudoRET implicit $lr
%1:_(<16 x s32>) = COPY $bmll1
%3:gprregbank(s32) = COPY $r0
%2:_(p0) = G_FRAME_INDEX %stack.0
G_STORE %1(<16 x s32>), %2(p0) :: (store (<16 x s32>))
%4:_(<16 x s32>) = G_LOAD %2(p0) :: (dereferenceable load (<16 x s32>))
%5:_(<16 x s32>) = G_ADD %4, %4
G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), %5(<16 x s32>), %3(s32)
PseudoRET implicit $lr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example usecase
; FAST-LABEL: name: test_v16acc32_spill_reload ; FAST: [[COPY:%[0-9]+]]:accregbank(<16 x s32>) = COPY $bmll1 ; FAST-NEXT: [[COPY1:%[0-9]+]]:gprregbank(s32) = COPY $r0 ; FAST-NEXT: [[FRAME_INDEX:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.0 ; FAST-NEXT: G_STORE [[COPY]](<16 x s32>), [[FRAME_INDEX]](p0) :: (store (<16 x s32>)) ; FAST-NEXT: [[LOAD:%[0-9]+]]:vregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>)) ; FAST-NEXT: [[ADD:%[0-9]+]]:vregbank(<16 x s32>) = G_ADD [[LOAD]], [[LOAD]] ; FAST-NEXT: [[COPY2:%[0-9]+]]:accregbank(<16 x s32>) = COPY [[ADD]](<16 x s32>) ; FAST-NEXT: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), [[COPY2]](<16 x s32>), [[COPY1]](s32) ; FAST-NEXT: PseudoRET implicit $lr %1:_(<16 x s32>) = COPY $bmll1 %3:gprregbank(s32) = COPY $r0 %2:_(p0) = G_FRAME_INDEX %stack.0 G_STORE %1(<16 x s32>), %2(p0) :: (store (<16 x s32>)) %4:_(<16 x s32>) = G_LOAD %2(p0) :: (dereferenceable load (<16 x s32>)) %5:_(<16 x s32>) = G_ADD %4, %4 G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.aie2p.mcd.write.acc32), %5(<16 x s32>), %3(s32) PseudoRET implicit $lr
yes, we have some cases in mllib too, I noticed that while running mllib. I will bring back this code. I am already working on to refine this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could also include a reduced test case from mllib @abnikant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't reproduce with the mllib tests I tried. I think we can drop it now and take it later if there is a need?
@@ -251,32 +251,125 @@ AIE2PRegisterBankInfo::getInstrAlternativeMappings( | |||
const MachineRegisterInfo &MRI = MF.getRegInfo(); | |||
switch (MI.getOpcode()) { | |||
case TargetOpcode::G_LOAD: | |||
case TargetOpcode::G_STORE: { | |||
case AIE2P::G_AIE_OFFSET_LOAD: | |||
case AIE2P::G_AIE_POSTINC_LOAD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about G_AIE_OFFSET_ZEXTLOAD
and G_AIE_OFFSET_SEXTLOAD
?
case TargetOpcode::G_LOAD: { | ||
case TargetOpcode::G_LOAD: | ||
case AIE2P::G_AIE_OFFSET_LOAD: | ||
case AIE2P::G_AIE_POSTINC_LOAD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about G_AIE_OFFSET_ZEXTLOAD
and G_AIE_OFFSET_SEXTLOAD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are falling back to Base Implementation for Size < 512.
@@ -229,6 +229,13 @@ unsigned AIE2InstrInfo::getOffsetMemOpcode(unsigned BaseMemOpcode) const { | |||
llvm_unreachable("not a generic load/store"); | |||
} | |||
|
|||
bool AIE2InstrInfo::isGenericOffsetMemOpcode(unsigned Opcode) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that all G_AIE*
are target specific. I feel that Generic
(besides of G
) can be a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in other PR, we need to update other hooks as well to remove 'Generic'. Possibly we can do it in other/separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you add the regbankselect support for AIE's load and store opcodes in general. It would be nice to test the other assignments not just fifo512 and acc512.
For scalars offset and postinc, there is these 2 tests for AIE2 which you can extend for AIE2P
regbankselect-offset-postinc-store.mir
and regbankselect-offset-postinc-load.mir
. I couldn't find any for vectors or 2D/3D
Note: all my review comments are concerning the 2nd commit regarding RegBankSelect. For the 1st commit (enabling combines), there is another PR #323 |
// Select the operand index for VecReg. | ||
const unsigned VecRegOpIdx = MI.getNumExplicitDefs(); | ||
Register VecReg = MI.getOperand(VecRegOpIdx).getReg(); | ||
if (!VecReg) | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return the base implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we return earlier for any type smaller than 512 like we do for the loads? since we are trying to assign either the accumulator or fifo bank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, now we return early for Size < 512.
54ead3e
to
e9dcf85
Compare
enabled these for aie2p and also added tests for vector. 2D/3D tests were already added for accumulator and fifo registers. |
e9dcf85
to
f75e1d3
Compare
@@ -205,7 +205,7 @@ body: | | |||
; FAST-NEXT: [[FRAME_INDEX:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.0 | |||
; FAST-NEXT: [[FRAME_INDEX1:%[0-9]+]]:ptrregbank(p0) = G_FRAME_INDEX %stack.1 | |||
; FAST-NEXT: G_STORE [[COPY]](<16 x s32>), [[FRAME_INDEX]](p0) :: (store (<16 x s32>)) | |||
; FAST-NEXT: [[LOAD:%[0-9]+]]:accregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>)) | |||
; FAST-NEXT: [[LOAD:%[0-9]+]]:vregbank(<16 x s32>) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (<16 x s32>)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we regressing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment #337 (comment)
// Base Alternative Mapping for size <= 256. | ||
if (Size <= 256) | ||
break; | ||
const unsigned Size = getSizeInBits(DstReg, MRI, TRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: DstSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: DstRegTy
AltMappings.push_back(&AccRegMapping); | ||
Register VecReg = MI.getOperand(VecRegOpIdx).getReg(); | ||
const LLT Ty = MRI.getType(VecReg); | ||
const unsigned Size = getSizeInBits(VecReg, MRI, TRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: VecRegTy and VecRegSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// If the register already belongs to a known bank, just return. | ||
if (getRegBank(UseCandidate, MRI, TRI) == &AIE2P::AccRegBank || | ||
getRegBank(UseCandidate, MRI, TRI) == &AIE2P::FifoRegBank) | ||
return AIEBaseRegisterBankInfo::getInstrMapping(MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also return if we know it is in the vector bank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
Register VecReg = MI.getOperand(VecRegOpIdx).getReg(); | ||
// Size of accumulator and fifo vectors on aie2p >= 512. | ||
// for VecSize < 512, fallback to base instruction mapping. | ||
if (!VecReg || MRI.getType(VecReg).getSizeInBits() < 512) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to check if(!VecReg)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed. dropped.
auto *DefRegBank = getRegBank(VecReg, MRI, TRI); | ||
// Only process if VecReg is defined in Accumulator or Fifo register banks. | ||
if (DefRegBank != &AIE2P::AccRegBank && DefRegBank != &AIE2P::FifoRegBank) | ||
return AIEBaseRegisterBankInfo::getInstrMapping(MI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not process the vector bank ones as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do the analysis to assign acc and fifo regbanks.
@@ -255,32 +255,125 @@ AIE2PRegisterBankInfo::getInstrAlternativeMappings( | |||
const MachineRegisterInfo &MRI = MF.getRegInfo(); | |||
switch (MI.getOpcode()) { | |||
case TargetOpcode::G_LOAD: | |||
case TargetOpcode::G_STORE: { | |||
case AIE2P::G_AIE_OFFSET_LOAD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really care about the alternative mapping? Since the cost is same, I think we always stick to the instruction mapping. May be you can check with --check-prefix=COMMON,GREEDY
in the test. I believe both fast and greedy produces the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we do need in many cases, and also cost might change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the tests removing --check-prefix=GREEDY
and --check-prefix=GREEDY
, I do not see any tests using alternative mappings.
case AIE2P::G_AIE_POSTINC_2D_STORE: | ||
case AIE2P::G_AIE_POSTINC_3D_STORE: { | ||
const unsigned OpIdx = MI.getNumExplicitDefs(); | ||
auto *RB = getRegBank(MI.getOperand(OpIdx).getReg(), MRI, TRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure OpIdx
is the right value. As I understand, we would like to check whether the vector to be store is a FifoReg
. Now we look for the pointer and it will always return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it is a pointer operand ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Idx should point to the first source operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. I got confused between LOAD.
// For 2048-bit sizes, mapping is restricted to the accumulator bank. | ||
// Also, if the instruction has any implicit defs/uses, leave it alone. | ||
if (Size <= 256 || Size == 2048 || | ||
MI.getNumOperands() != MI.getNumExplicitOperands()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any valid case of MI.getNumOperands() != MI.getNumExplicitOperands()
considering loads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not, needed. I dropped this check.
} | ||
|
||
// Assign PMI_PTR to the next operand if within bounds | ||
assert(((VecRegOpIdx + 1) < NumOperands) && "RegOpIdx + 1 out of bound"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: VecRegOpIdx + 1 out of bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
…st-increment addressing modes
f75e1d3
to
6e26d97
Compare
Also, I ran mllib 3 days back, there was no extra failure. I have fired the run again, will post the result once done. |
Improved register bank assignments (for accumulator and FIFO register banks) to support load/store post-increment, post-increment 2D, post-increment 3D, and offset addressing modes.