-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64: Fix prolog/epilog #3903
ARM64: Fix prolog/epilog #3903
Conversation
@aaronsgiles please look at commit d7de8f7 (Fix BLR and RET Encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely please undo the copied code from ARM32 as it is wrong. The check for the encoding needs to be revisited in general. Short-term, maybe add an assert to catch if the stack adjustment load is generating more than 1 instruction?
lib/Backend/arm64/EncoderMD.cpp
Outdated
@@ -1086,6 +1086,87 @@ EncoderMD::Encode(IR::Instr *instr, BYTE *pc, BYTE* beginCodeAddress) | |||
} | |||
|
|||
bool | |||
EncoderMD::CanEncodeModConst12(DWORD constant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong for ARM64. Totally different encoding already implemented via CanEncodeLogicalConstant.
@@ -998,7 +998,7 @@ EmitBlr( | |||
Arm64SimpleRegisterParam Reg | |||
) | |||
{ | |||
return Emitter.EmitFourBytes(0xd65f0000 | (Reg.RawRegister() << 5)); | |||
return Emitter.EmitFourBytes(0xd63f0000 | (Reg.RawRegister() << 5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the verification for this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For BLR opcode disassembler is showing a RET and PC+4 is not being saved in X30/LR. After swapping the encodings I see BLR and PC+4 is being saved correctly. So I made the change and wanted to verify with you that its correct. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've confirmed it from the ARMARM. Not sure how they got swapped. #Resolved
lib/Backend/arm64/EncoderMD.h
Outdated
@@ -188,6 +189,7 @@ class EncoderMD | |||
|
|||
void AddLabelReloc(BYTE* relocAddress); | |||
|
|||
static bool CanEncodeModConst12(DWORD constant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this, it doesn't apply to ARM64. The equivalent constants are encoded on ARM64 by CanEncodeLogicalConstant below.
@@ -998,7 +998,7 @@ EmitBlr( | |||
Arm64SimpleRegisterParam Reg | |||
) | |||
{ | |||
return Emitter.EmitFourBytes(0xd65f0000 | (Reg.RawRegister() << 5)); | |||
return Emitter.EmitFourBytes(0xd63f0000 | (Reg.RawRegister() << 5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've confirmed it from the ARMARM. Not sure how they got swapped. #Resolved
lib/Backend/arm64/LowerMD.cpp
Outdated
@@ -803,7 +798,7 @@ LowererMD::GenerateStackProbe(IR::Instr *insertInstr, bool afterProlog) | |||
this->CreateAssign(scratchOpnd, IR::IndirOpnd::New(scratchOpnd, 0, TyMachReg, this->m_func), insertInstr); | |||
} | |||
|
|||
if (EncoderMD::CanEncodeModConst12(frameSize)) | |||
if (EncoderMD::CanEncodeLogicalConst(frameSize, 4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still wrong, however. Now that I have more context, this code is checking whether it can do the ADD in one instruction. CanEncodeLogicalConst determines whether something for a logical operation (AND/ORR/EOR) can be encoded. To check the range for ADD, it must be a 12-bit unsigned value in order to fit in one instruction, so I'd use IS_CONST_00000FFF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, watch out below. Use ADD instead of ADDS. On ARM32, ADDS was used because it encoded smaller (2 bytes versus 4 bytes) but set the flags, which creates unnecessary dependencies in the CPU pipeline. Whenever you see ADDS or SUBS or anything that sets the flags, re-evaluate whether they are necessary, and remove the 'S' if they are not. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/Backend/arm64/LowerMD.cpp
Outdated
(m_func->GetJITFunctionBody()->DoInterruptProbe() || !m_func->GetThreadContextInfo()->IsThreadBound()) && | ||
!EncoderMD::CanEncodeModConst12(stackProbeStackHeight + Js::Constants::MinStackJIT);*/ | ||
!EncoderMD::CanEncodeLogicalConst(stackProbeStackHeight + Js::Constants::MinStackJIT, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs a change as well if the code is trying to anticipate what GenerateStackProbe is going to do (though it's really ugly to have the logic in two places and a dependency between them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is right, we should create a function like StackProbeCanEncodeOffset() or something and call it from both places.
lib/Backend/arm64/LowerMD.cpp
Outdated
//bit mask for these registers. | ||
|
||
if (!IsSmallStack(stackAdjust) || useDynamicStackProbe) | ||
{ | ||
unwindInfo->SetSavedScratchReg(true); | ||
if (!usedRegs.Test(RegEncode[SP_ALLOC_SCRATCH_REG])) //If its a large stack and RegR4 is not already saved. | ||
if (!usedRegs.Test(RegEncode[SP_ALLOC_SCRATCH_REG])) //If its a large stack and RegR18 is not already saved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope the RegR18 comment should be RegR17. x18 is reserved for the TEB and is off limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed SP_ALLOC_SCRATCH_REG to RegR19 as RegR17 is used as SCRATCH_REG.
I will revisit this whole SCRATCH_REG and GenerateStackAllocation logic as it seems on AMR64 _chkstk doesn't have the requirement to pass stack size in R4.
In reply to: 143537349 [](ancestors = 143537349)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, just remember that x19 is callee-save, so you probably can't use it in the prolog. x16/x17 are probably better choices. On ARM64, _chkstk is called with the amount to allocate divided by 16 in x15.
; The typical calling sequence from the prologue is:
;
; mov x15, #immed ; set requested stack frame size
; bl __chkstk ; verify stack
; sub sp, sp, x15, lsl #4 ; apply stack adjust
I'll need to add some way to encode that SP subtract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New opcode SUB_LSL4 now implemented in commit 764546b
// Zero-initialize dedicated arguments slot | ||
if (hasCalls) | ||
{ | ||
//R17 acts a dummy zero register which we push to arguments slot | ||
//mov r17, 0 | ||
Assert(r17Opnd == nullptr); | ||
r17Opnd = IR::RegOpnd::New(nullptr, SCRATCH_REG, TyMachReg, this->m_func); | ||
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOV, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func); | ||
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOVZ, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note MOVZ has a limit of 16 bit immediates. Hopefully that is guaranteed to be enough, but an assert and/or comment about that is warranted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK as well since it's a constant 0.
lib/Backend/arm64/LowerMD.cpp
Outdated
instrPush->SetSrc1(IR::RegBVOpnd::New(frameRegs, TyMachReg, this->m_func)); | ||
insertInstr->InsertBefore(instrPush); | ||
// Generate STP {fp,lr} | ||
IR::Instr * instrStp = IR::Instr::New(Js::OpCode::STP_PRE, this->m_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit of pre-indexed STP is -512, so unless you can guarantee that, you will need to add fallback logic to do a manual stack allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only pushing R0-R7, fp, lr so its 10*8 which is < 512. Added an assert for now.
In reply to: 143538688 [](ancestors = 143538688)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you can guarantee it, you're good. Just wanting to be careful.
instrPush->SetDst(IR::IndirOpnd::New(IR::RegOpnd::New(nullptr, RegSP, TyMachReg, this->m_func), (int32)0, TyMachReg, this->m_func)); | ||
instrPush->SetSrc1(IR::RegBVOpnd::New(usedDoubleRegs, TyMachReg, this->m_func)); | ||
insertInstr->InsertBefore(instrPush); | ||
for (RegNum reg = LAST_CALLEE_SAVED_DBL_REG; reg >= FIRST_CALLEE_SAVED_DBL_REG; reg = (RegNum)(reg - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use FSTR here. Also floating point registers can be stored in pairs as well, so pairwise logic should eventually be added to use FSTP. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// mov r12, 0 | ||
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOV, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func); | ||
// mov r17, 0 | ||
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOVZ, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, watch the limits of MOVZ (0-65535). You should at least have an Assert in place here to catch it in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's fine. Sorry still getting used to reading the syntax of these and missed that it was a constant 0.
lib/Backend/arm64/LowerMD.cpp
Outdated
instrVPop->SetDst(IR::RegBVOpnd::New(savedDoubleRegs, TyMachReg, this->m_func)); | ||
instrVPop->SetSrc1(IR::IndirOpnd::New(IR::RegOpnd::New(nullptr, RegSP,TyMachReg, this->m_func), (int32)0, TyMachReg, this->m_func)); | ||
exitInstr->InsertBefore(instrVPop); | ||
for (RegNum reg = FIRST_CALLEE_SAVED_DBL_REG; reg <= LAST_CALLEE_SAVED_DBL_REG; reg = (RegNum)(reg + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, use LDP to load in pairs for efficiency/code size. It can be a TODO for now, but should be notated. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also since this is the floating point case, use FLDR instead of LDR. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(homedParams + 1) * MachRegInt, | ||
TyMachPtr, this->m_func); | ||
// Generate LDP {fp,lr} | ||
IR::Instr * instrStp = IR::Instr::New(Js::OpCode::LDP_POST, this->m_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, LDP is limited to +512 byte offset, so you will need a fallback case for bigger allocations. TODO item for sure. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry you're right, it is only a fixed amount. My bad. #Closed
f4d4da2
to
18edd79
Compare
Merge pull request #3903 from agarwal-sandeep:armwork Fix BLR and RET opcode encoding. Seems like they were reversed. Fix prolog and epilog. There is still some work left to optimize STR/LDR to STP/LDP. Also need to make sure stack probe code works correctly.
Fix BLR and RET opcode encoding. Seems like they were reversed.
Fix prolog and epilog. There is still some work left to optimize STR/LDR to STP/LDP. Also need to make sure stack probe code works correctly.