Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64: Fix prolog/epilog #3903

Merged
merged 2 commits into from
Oct 10, 2017
Merged

Conversation

agarwal-sandeep
Copy link
Collaborator

Fix BLR and RET opcode encoding. Seems like they were reversed.

Fix prolog and epilog. There is still some work left to optimize STR/LDR to STP/LDP. Also need to make sure stack probe code works correctly.

@agarwal-sandeep
Copy link
Collaborator Author

@aaronsgiles please look at commit d7de8f7 (Fix BLR and RET Encoding)

Copy link
Contributor

@aaronsgiles aaronsgiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely please undo the copied code from ARM32 as it is wrong. The check for the encoding needs to be revisited in general. Short-term, maybe add an assert to catch if the stack adjustment load is generating more than 1 instruction?

@@ -1086,6 +1086,87 @@ EncoderMD::Encode(IR::Instr *instr, BYTE *pc, BYTE* beginCodeAddress)
}

bool
EncoderMD::CanEncodeModConst12(DWORD constant)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong for ARM64. Totally different encoding already implemented via CanEncodeLogicalConstant.

@@ -998,7 +998,7 @@ EmitBlr(
Arm64SimpleRegisterParam Reg
)
{
return Emitter.EmitFourBytes(0xd65f0000 | (Reg.RawRegister() << 5));
return Emitter.EmitFourBytes(0xd63f0000 | (Reg.RawRegister() << 5));
Copy link
Contributor

@aaronsgiles aaronsgiles Oct 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the verification for this? #Resolved

Copy link
Collaborator Author

@agarwal-sandeep agarwal-sandeep Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For BLR opcode disassembler is showing a RET and PC+4 is not being saved in X30/LR. After swapping the encodings I see BLR and PC+4 is being saved correctly. So I made the change and wanted to verify with you that its correct. #Resolved

Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've confirmed it from the ARMARM. Not sure how they got swapped. #Resolved

@@ -188,6 +189,7 @@ class EncoderMD

void AddLabelReloc(BYTE* relocAddress);

static bool CanEncodeModConst12(DWORD constant);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this, it doesn't apply to ARM64. The equivalent constants are encoded on ARM64 by CanEncodeLogicalConstant below.

@@ -998,7 +998,7 @@ EmitBlr(
Arm64SimpleRegisterParam Reg
)
{
return Emitter.EmitFourBytes(0xd65f0000 | (Reg.RawRegister() << 5));
return Emitter.EmitFourBytes(0xd63f0000 | (Reg.RawRegister() << 5));
Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've confirmed it from the ARMARM. Not sure how they got swapped. #Resolved

@@ -803,7 +798,7 @@ LowererMD::GenerateStackProbe(IR::Instr *insertInstr, bool afterProlog)
this->CreateAssign(scratchOpnd, IR::IndirOpnd::New(scratchOpnd, 0, TyMachReg, this->m_func), insertInstr);
}

if (EncoderMD::CanEncodeModConst12(frameSize))
if (EncoderMD::CanEncodeLogicalConst(frameSize, 4))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still wrong, however. Now that I have more context, this code is checking whether it can do the ADD in one instruction. CanEncodeLogicalConst determines whether something for a logical operation (AND/ORR/EOR) can be encoded. To check the range for ADD, it must be a 12-bit unsigned value in order to fit in one instruction, so I'd use IS_CONST_00000FFF.

Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, watch out below. Use ADD instead of ADDS. On ARM32, ADDS was used because it encoded smaller (2 bytes versus 4 bytes) but set the flags, which creates unnecessary dependencies in the CPU pipeline. Whenever you see ADDS or SUBS or anything that sets the flags, re-evaluate whether they are necessary, and remove the 'S' if they are not. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ADD


In reply to: 143535757 [](ancestors = 143535757)

(m_func->GetJITFunctionBody()->DoInterruptProbe() || !m_func->GetThreadContextInfo()->IsThreadBound()) &&
!EncoderMD::CanEncodeModConst12(stackProbeStackHeight + Js::Constants::MinStackJIT);*/
!EncoderMD::CanEncodeLogicalConst(stackProbeStackHeight + Js::Constants::MinStackJIT, 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs a change as well if the code is trying to anticipate what GenerateStackProbe is going to do (though it's really ugly to have the logic in two places and a dependency between them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is right, we should create a function like StackProbeCanEncodeOffset() or something and call it from both places.

//bit mask for these registers.

if (!IsSmallStack(stackAdjust) || useDynamicStackProbe)
{
unwindInfo->SetSavedScratchReg(true);
if (!usedRegs.Test(RegEncode[SP_ALLOC_SCRATCH_REG])) //If its a large stack and RegR4 is not already saved.
if (!usedRegs.Test(RegEncode[SP_ALLOC_SCRATCH_REG])) //If its a large stack and RegR18 is not already saved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hope the RegR18 comment should be RegR17. x18 is reserved for the TEB and is off limits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed SP_ALLOC_SCRATCH_REG to RegR19 as RegR17 is used as SCRATCH_REG.

I will revisit this whole SCRATCH_REG and GenerateStackAllocation logic as it seems on AMR64 _chkstk doesn't have the requirement to pass stack size in R4.


In reply to: 143537349 [](ancestors = 143537349)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just remember that x19 is callee-save, so you probably can't use it in the prolog. x16/x17 are probably better choices. On ARM64, _chkstk is called with the amount to allocate divided by 16 in x15.

; The typical calling sequence from the prologue is:
;
; mov x15, #immed ; set requested stack frame size
; bl __chkstk ; verify stack
; sub sp, sp, x15, lsl #4 ; apply stack adjust

I'll need to add some way to encode that SP subtract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New opcode SUB_LSL4 now implemented in commit 764546b

// Zero-initialize dedicated arguments slot
if (hasCalls)
{
//R17 acts a dummy zero register which we push to arguments slot
//mov r17, 0
Assert(r17Opnd == nullptr);
r17Opnd = IR::RegOpnd::New(nullptr, SCRATCH_REG, TyMachReg, this->m_func);
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOV, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOVZ, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note MOVZ has a limit of 16 bit immediates. Hopefully that is guaranteed to be enough, but an assert and/or comment about that is warranted here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK as well since it's a constant 0.

instrPush->SetSrc1(IR::RegBVOpnd::New(frameRegs, TyMachReg, this->m_func));
insertInstr->InsertBefore(instrPush);
// Generate STP {fp,lr}
IR::Instr * instrStp = IR::Instr::New(Js::OpCode::STP_PRE, this->m_func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit of pre-indexed STP is -512, so unless you can guarantee that, you will need to add fallback logic to do a manual stack allocation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only pushing R0-R7, fp, lr so its 10*8 which is < 512. Added an assert for now.


In reply to: 143538688 [](ancestors = 143538688)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you can guarantee it, you're good. Just wanting to be careful.

instrPush->SetDst(IR::IndirOpnd::New(IR::RegOpnd::New(nullptr, RegSP, TyMachReg, this->m_func), (int32)0, TyMachReg, this->m_func));
instrPush->SetSrc1(IR::RegBVOpnd::New(usedDoubleRegs, TyMachReg, this->m_func));
insertInstr->InsertBefore(instrPush);
for (RegNum reg = LAST_CALLEE_SAVED_DBL_REG; reg >= FIRST_CALLEE_SAVED_DBL_REG; reg = (RegNum)(reg - 1))
Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use FSTR here. Also floating point registers can be stored in pairs as well, so pairwise logic should eventually be added to use FSTP. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to FSTR


In reply to: 143538894 [](ancestors = 143538894)

// mov r12, 0
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOV, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);
// mov r17, 0
IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOVZ, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, watch the limits of MOVZ (0-65535). You should at least have an Assert in place here to catch it in case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only moving 0, should be ok?


In reply to: 143539114 [](ancestors = 143539114)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine. Sorry still getting used to reading the syntax of these and missed that it was a constant 0.

instrVPop->SetDst(IR::RegBVOpnd::New(savedDoubleRegs, TyMachReg, this->m_func));
instrVPop->SetSrc1(IR::IndirOpnd::New(IR::RegOpnd::New(nullptr, RegSP,TyMachReg, this->m_func), (int32)0, TyMachReg, this->m_func));
exitInstr->InsertBefore(instrVPop);
for (RegNum reg = FIRST_CALLEE_SAVED_DBL_REG; reg <= LAST_CALLEE_SAVED_DBL_REG; reg = (RegNum)(reg + 1))
Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, use LDP to load in pairs for efficiency/code size. It can be a TODO for now, but should be notated. #Resolved

Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since this is the floating point case, use FLDR instead of LDR. #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ToDo comment


In reply to: 143539271 [](ancestors = 143539271)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


In reply to: 143539446 [](ancestors = 143539446)

(homedParams + 1) * MachRegInt,
TyMachPtr, this->m_func);
// Generate LDP {fp,lr}
IR::Instr * instrStp = IR::Instr::New(Js::OpCode::LDP_POST, this->m_func);
Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, LDP is limited to +512 byte offset, so you will need a fallback case for bigger allocations. TODO item for sure. #ByDesign

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative to SP? The offset here is only 16


In reply to: 143539590 [](ancestors = 143539590)

Copy link
Contributor

@aaronsgiles aaronsgiles Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry you're right, it is only a fixed amount. My bad. #Closed

@chakrabot chakrabot merged commit 18edd79 into chakra-core:master Oct 10, 2017
chakrabot pushed a commit that referenced this pull request Oct 10, 2017
Merge pull request #3903 from agarwal-sandeep:armwork

Fix BLR and RET opcode encoding. Seems like they were reversed.

Fix prolog and epilog. There is still some work left to optimize STR/LDR to STP/LDP. Also need to make sure stack probe code works correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants