ARM64: Fix prolog/epilog #3903

agarwal-sandeep · 2017-10-08T21:38:51Z

Fix BLR and RET opcode encoding. Seems like they were reversed.

Fix prolog and epilog. There is still some work left to optimize STR/LDR to STP/LDP. Also need to make sure stack probe code works correctly.

agarwal-sandeep · 2017-10-08T21:54:18Z

@aaronsgiles please look at commit d7de8f7 (Fix BLR and RET Encoding)

aaronsgiles

Definitely please undo the copied code from ARM32 as it is wrong. The check for the encoding needs to be revisited in general. Short-term, maybe add an assert to catch if the stack adjustment load is generating more than 1 instruction?

aaronsgiles · 2017-10-08T22:06:31Z

lib/Backend/arm64/EncoderMD.cpp

@@ -1086,6 +1086,87 @@ EncoderMD::Encode(IR::Instr *instr, BYTE *pc, BYTE* beginCodeAddress)
 }

 bool
+EncoderMD::CanEncodeModConst12(DWORD constant)


This is wrong for ARM64. Totally different encoding already implemented via CanEncodeLogicalConstant.

aaronsgiles · 2017-10-08T22:18:20Z

lib/Backend/arm64/ARM64Encoder.h

@@ -998,7 +998,7 @@ EmitBlr(
    Arm64SimpleRegisterParam Reg
    )
 {
-    return Emitter.EmitFourBytes(0xd65f0000 | (Reg.RawRegister() << 5));
+    return Emitter.EmitFourBytes(0xd63f0000 | (Reg.RawRegister() << 5));


What was the verification for this? #Resolved

For BLR opcode disassembler is showing a RET and PC+4 is not being saved in X30/LR. After swapping the encodings I see BLR and PC+4 is being saved correctly. So I made the change and wanted to verify with you that its correct. #Resolved

Ok, I've confirmed it from the ARMARM. Not sure how they got swapped. #Resolved

aaronsgiles · 2017-10-08T22:23:05Z

lib/Backend/arm64/EncoderMD.h

@@ -188,6 +189,7 @@ class EncoderMD

    void            AddLabelReloc(BYTE* relocAddress);

+    static bool     CanEncodeModConst12(DWORD constant);


Please remove this, it doesn't apply to ARM64. The equivalent constants are encoded on ARM64 by CanEncodeLogicalConstant below.

aaronsgiles · 2017-10-09T17:44:05Z

lib/Backend/arm64/ARM64Encoder.h

@@ -998,7 +998,7 @@ EmitBlr(
    Arm64SimpleRegisterParam Reg
    )
 {
-    return Emitter.EmitFourBytes(0xd65f0000 | (Reg.RawRegister() << 5));
+    return Emitter.EmitFourBytes(0xd63f0000 | (Reg.RawRegister() << 5));


Ok, I've confirmed it from the ARMARM. Not sure how they got swapped. #Resolved

aaronsgiles · 2017-10-09T17:48:55Z

lib/Backend/arm64/LowerMD.cpp

@@ -803,7 +798,7 @@ LowererMD::GenerateStackProbe(IR::Instr *insertInstr, bool afterProlog)
            this->CreateAssign(scratchOpnd, IR::IndirOpnd::New(scratchOpnd, 0, TyMachReg, this->m_func), insertInstr);
        }

-        if (EncoderMD::CanEncodeModConst12(frameSize))
+        if (EncoderMD::CanEncodeLogicalConst(frameSize, 4))


This is still wrong, however. Now that I have more context, this code is checking whether it can do the ADD in one instruction. CanEncodeLogicalConst determines whether something for a logical operation (AND/ORR/EOR) can be encoded. To check the range for ADD, it must be a 12-bit unsigned value in order to fit in one instruction, so I'd use IS_CONST_00000FFF.

Also, watch out below. Use ADD instead of ADDS. On ARM32, ADDS was used because it encoded smaller (2 bytes versus 4 bytes) but set the flags, which creates unnecessary dependencies in the CPU pipeline. Whenever you see ADDS or SUBS or anything that sets the flags, re-evaluate whether they are necessary, and remove the 'S' if they are not. #Resolved

Changed to ADD

In reply to: 143535757 [](ancestors = 143535757)

aaronsgiles · 2017-10-09T17:55:33Z

lib/Backend/arm64/LowerMD.cpp

        (m_func->GetJITFunctionBody()->DoInterruptProbe() || !m_func->GetThreadContextInfo()->IsThreadBound()) &&
-        !EncoderMD::CanEncodeModConst12(stackProbeStackHeight + Js::Constants::MinStackJIT);*/
+        !EncoderMD::CanEncodeLogicalConst(stackProbeStackHeight + Js::Constants::MinStackJIT, 4);


This probably needs a change as well if the code is trying to anticipate what GenerateStackProbe is going to do (though it's really ugly to have the logic in two places and a dependency between them).

If my understanding is right, we should create a function like StackProbeCanEncodeOffset() or something and call it from both places.

aaronsgiles · 2017-10-09T17:56:55Z

lib/Backend/arm64/LowerMD.cpp

        //bit mask for these registers.

        if (!IsSmallStack(stackAdjust) || useDynamicStackProbe)
        {
            unwindInfo->SetSavedScratchReg(true);
-            if (!usedRegs.Test(RegEncode[SP_ALLOC_SCRATCH_REG])) //If its a large stack and RegR4 is not already saved.
+            if (!usedRegs.Test(RegEncode[SP_ALLOC_SCRATCH_REG])) //If its a large stack and RegR18 is not already saved.


I really hope the RegR18 comment should be RegR17. x18 is reserved for the TEB and is off limits.

Changed SP_ALLOC_SCRATCH_REG to RegR19 as RegR17 is used as SCRATCH_REG.

I will revisit this whole SCRATCH_REG and GenerateStackAllocation logic as it seems on AMR64 _chkstk doesn't have the requirement to pass stack size in R4.

In reply to: 143537349 [](ancestors = 143537349)

Ok, just remember that x19 is callee-save, so you probably can't use it in the prolog. x16/x17 are probably better choices. On ARM64, _chkstk is called with the amount to allocate divided by 16 in x15.

; The typical calling sequence from the prologue is:
;
; mov x15, #immed ; set requested stack frame size
; bl __chkstk ; verify stack
; sub sp, sp, x15, lsl #4 ; apply stack adjust

I'll need to add some way to encode that SP subtract.

New opcode SUB_LSL4 now implemented in commit 764546b

aaronsgiles · 2017-10-09T17:59:11Z

lib/Backend/arm64/LowerMD.cpp

    // Zero-initialize dedicated arguments slot
    if (hasCalls)
    {
        //R17 acts a dummy zero register which we push to arguments slot
        //mov r17, 0
        Assert(r17Opnd == nullptr);
        r17Opnd = IR::RegOpnd::New(nullptr, SCRATCH_REG, TyMachReg, this->m_func);
-        IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOV, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);
+        IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOVZ, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);


Note MOVZ has a limit of 16 bit immediates. Hopefully that is guaranteed to be enough, but an assert and/or comment about that is warranted here.

This is OK as well since it's a constant 0.

aaronsgiles · 2017-10-09T18:02:50Z

lib/Backend/arm64/LowerMD.cpp

-        instrPush->SetSrc1(IR::RegBVOpnd::New(frameRegs, TyMachReg, this->m_func));
-        insertInstr->InsertBefore(instrPush);
+        // Generate STP {fp,lr}
+        IR::Instr * instrStp = IR::Instr::New(Js::OpCode::STP_PRE, this->m_func);


The limit of pre-indexed STP is -512, so unless you can guarantee that, you will need to add fallback logic to do a manual stack allocation.

We are only pushing R0-R7, fp, lr so its 10*8 which is < 512. Added an assert for now.

In reply to: 143538688 [](ancestors = 143538688)

As long as you can guarantee it, you're good. Just wanting to be careful.

aaronsgiles · 2017-10-09T18:03:48Z

lib/Backend/arm64/LowerMD.cpp

-        instrPush->SetDst(IR::IndirOpnd::New(IR::RegOpnd::New(nullptr, RegSP, TyMachReg, this->m_func), (int32)0, TyMachReg, this->m_func));
-        instrPush->SetSrc1(IR::RegBVOpnd::New(usedDoubleRegs, TyMachReg, this->m_func));
-        insertInstr->InsertBefore(instrPush);
+        for (RegNum reg = LAST_CALLEE_SAVED_DBL_REG; reg >= FIRST_CALLEE_SAVED_DBL_REG; reg = (RegNum)(reg - 1))


You need to use FSTR here. Also floating point registers can be stored in pairs as well, so pairwise logic should eventually be added to use FSTP. #Resolved

Changed to FSTR

In reply to: 143538894 [](ancestors = 143538894)

aaronsgiles · 2017-10-09T18:04:50Z

lib/Backend/arm64/LowerMD.cpp

-            // mov r12, 0
-            IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOV, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);
+            // mov r17, 0
+            IR::Instr * instrMov = IR::Instr::New(Js::OpCode::MOVZ, r17Opnd, IR::IntConstOpnd::New(0, TyMachReg, this->m_func), this->m_func);


Again, watch the limits of MOVZ (0-65535). You should at least have an Assert in place here to catch it in case.

Only moving 0, should be ok?

In reply to: 143539114 [](ancestors = 143539114)

Yes, that's fine. Sorry still getting used to reading the syntax of these and missed that it was a constant 0.

aaronsgiles · 2017-10-09T18:05:37Z

lib/Backend/arm64/LowerMD.cpp

-        instrVPop->SetDst(IR::RegBVOpnd::New(savedDoubleRegs, TyMachReg, this->m_func));
-        instrVPop->SetSrc1(IR::IndirOpnd::New(IR::RegOpnd::New(nullptr, RegSP,TyMachReg, this->m_func), (int32)0, TyMachReg, this->m_func));
-        exitInstr->InsertBefore(instrVPop);
+        for (RegNum reg = FIRST_CALLEE_SAVED_DBL_REG; reg <= LAST_CALLEE_SAVED_DBL_REG; reg = (RegNum)(reg + 1))


Same comment as above, use LDP to load in pairs for efficiency/code size. It can be a TODO for now, but should be notated. #Resolved

Also since this is the floating point case, use FLDR instead of LDR. #Resolved

Added ToDo comment

In reply to: 143539271 [](ancestors = 143539271)

Changed

In reply to: 143539446 [](ancestors = 143539446)

aaronsgiles · 2017-10-09T18:07:09Z

lib/Backend/arm64/LowerMD.cpp

-             (homedParams + 1) * MachRegInt,
-            TyMachPtr, this->m_func);
+    // Generate LDP {fp,lr}
+    IR::Instr * instrStp = IR::Instr::New(Js::OpCode::LDP_POST, this->m_func);


Again, LDP is limited to +512 byte offset, so you will need a fallback case for bigger allocations. TODO item for sure. #ByDesign

Relative to SP? The offset here is only 16

In reply to: 143539590 [](ancestors = 143539590)

Sorry you're right, it is only a fixed amount. My bad. #Closed

Merge pull request #3903 from agarwal-sandeep:armwork Fix BLR and RET opcode encoding. Seems like they were reversed. Fix prolog and epilog. There is still some work left to optimize STR/LDR to STP/LDP. Also need to make sure stack probe code works correctly.

aaronsgiles suggested changes Oct 9, 2017

View reviewed changes

aaronsgiles approved these changes Oct 10, 2017

View reviewed changes

agarwal-sandeep added 2 commits October 9, 2017 17:59

ARM64: Fix BLR and RET Encoding

e973e9f

ARM64: Fix prolog/epilog

18edd79

agarwal-sandeep force-pushed the armwork branch from f4d4da2 to 18edd79 Compare October 10, 2017 01:05

chakrabot merged commit 18edd79 into chakra-core:master Oct 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64: Fix prolog/epilog #3903

ARM64: Fix prolog/epilog #3903

agarwal-sandeep commented Oct 8, 2017

agarwal-sandeep commented Oct 8, 2017

aaronsgiles left a comment

aaronsgiles Oct 8, 2017

aaronsgiles Oct 8, 2017 •

edited by agarwal-sandeep

Loading

agarwal-sandeep Oct 9, 2017 •

edited

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 8, 2017

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

agarwal-sandeep Oct 9, 2017

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

agarwal-sandeep Oct 9, 2017

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

		@@ -188,6 +189,7 @@ class EncoderMD

		void AddLabelReloc(BYTE* relocAddress);

		static bool CanEncodeModConst12(DWORD constant);

ARM64: Fix prolog/epilog #3903

ARM64: Fix prolog/epilog #3903

Conversation

agarwal-sandeep commented Oct 8, 2017

agarwal-sandeep commented Oct 8, 2017

aaronsgiles left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 8, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

agarwal-sandeep Oct 9, 2017 • edited Loading

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronsgiles Oct 9, 2017 • edited by agarwal-sandeep Loading

Choose a reason for hiding this comment

aaronsgiles Oct 8, 2017 •

edited by agarwal-sandeep

Loading

agarwal-sandeep Oct 9, 2017 •

edited

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading

aaronsgiles Oct 9, 2017 •

edited by agarwal-sandeep

Loading