Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64/Windows] Backport #9500 changes #10190

Merged
merged 5 commits into from
Apr 4, 2017

Conversation

sdmaclea
Copy link

Fixes #9526

@sdmaclea
Copy link
Author

@dotnet-bot test Windows_NT arm64 Cross Debug Build

@sdmaclea
Copy link
Author

@RussKeldorph dotnet bot seems to ignore my requests to test this for windows arm64. Can I get permission to trigger windows CI arm64 builds.

@dotnet/arm64-contrib @janvorli @jashook PTAL

These represent changes that were added to the [Arm64/Unix] platform code to fix failing tests. It seems they should also be in the Windows platform.

As I cannot trigger a CI build or compile locally...

  • The assembler syntax might have errors.... (yet)
  • There has been no testing (yet)

@RussKeldorph
Copy link

@sdmaclea Best we can do is have you mention @dotnet/arm64-contrib to have someone review and request on your behalf.

@dotnet-bot test Windows_NT arm64 Cross Debug Build

@sdmaclea
Copy link
Author

@RussKeldorph Thanks

Build is failing, but console output is insufficient to see why/which assert is firing. I do not think I can fix the patch without being able to build locally.

@dotnet/arm64-contrib PTAL

@@ -76,6 +77,8 @@ LNoFloatingPoint
ldr x9, [x19,#CallDescrData__pTarget]
blr x9

ldr x19, [fp, 24] ; Fixup corrupted X19 callee preserved register
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x19 is callee saved so it should not be trashed on return from call. Can you explain who/when is x19 getting corrupted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a workaround for unexpected x19 corruption. I had tried to debug this failure by storing the correct value for x19 on the stack above and adding a conditional break, but was unable to do so. The test seemed to fail to make forward progress.

Let me try to reproduce. I may have better luck as the GC is more stable now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing the equivalent of this line, there are 43 failures on [Arm64/Unix] tip.

./JIT.Directed.Misc.function_pointer.MutualThdRecur-fptr.MutualThdRecur-fptr.sh
./JIT.Directed.tailcall.tailcall.tailcall.sh
./JIT.Methodical.Boxing.misc._dbgconcurgc_il._dbgconcurgc_il.sh
./JIT.Methodical.Boxing.misc._relconcurgc_il._relconcurgc_il.sh
./JIT.Methodical.switch.switch10.switch10.sh
./JIT.Methodical.switch.switch11.switch11.sh
./JIT.Methodical.switch.switch1.switch1.sh
./JIT.Methodical.switch.switch2.switch2.sh
./JIT.Methodical.switch.switch3.switch3.sh
./JIT.Methodical.switch.switch4.switch4.sh
./JIT.Methodical.switch.switch5.switch5.sh
./JIT.Methodical.switch.switch6.switch6.sh
./JIT.Methodical.switch.switch7.switch7.sh
./JIT.Methodical.switch.switch8.switch8.sh
./JIT.Methodical.switch.switch9.switch9.sh
./JIT.Methodical.tailcall._il_dbgrecurse_ep_void._il_dbgrecurse_ep_void.sh
./JIT.Methodical.tailcall._il_dbgtest_2a._il_dbgtest_2a.sh
./JIT.Methodical.tailcall._il_dbgtest_2b._il_dbgtest_2b.sh
./JIT.Methodical.tailcall._il_dbgtest_2c._il_dbgtest_2c.sh
./JIT.Methodical.tailcall._il_dbgtest_implicit._il_dbgtest_implicit.sh
./JIT.Methodical.tailcall._il_dbgtest_mutual_rec._il_dbgtest_mutual_rec.sh
./JIT.Methodical.tailcall._il_dbgtest_switch._il_dbgtest_switch.sh
./JIT.Methodical.tailcall._il_dbgtest_virt._il_dbgtest_virt.sh
./JIT.Methodical.tailcall._il_dbgtest_void._il_dbgtest_void.sh
./JIT.Methodical.tailcall._il_relrecurse_ep_void._il_relrecurse_ep_void.sh
./JIT.Methodical.tailcall._il_reltest_2a._il_reltest_2a.sh
./JIT.Methodical.tailcall._il_reltest_2b._il_reltest_2b.sh
./JIT.Methodical.tailcall._il_reltest_2c._il_reltest_2c.sh
./JIT.Methodical.tailcall._il_reltest_implicit._il_reltest_implicit.sh
./JIT.Methodical.tailcall._il_reltest_mutual_rec._il_reltest_mutual_rec.sh
./JIT.Methodical.tailcall._il_reltest_switch._il_reltest_switch.sh
./JIT.Methodical.tailcall._il_reltest_virt._il_reltest_virt.sh
./JIT.Methodical.tailcall._il_reltest_void._il_reltest_void.sh
./JIT.Regression.CLR-x86-JIT.V1.1-M1-Beta1.b143840.b143840.b143840.sh
./JIT.Regression.CLR-x86-JIT.V1-M12-Beta2.b65423.b65423.b65423.sh
./JIT.Regression.CLR-x86-JIT.V1-QFE.b151440.static-none.static-none.sh
./JIT.Regression.CLR-x86-JIT.V2.0-Beta2.b426654.b426654.b426654.sh
./JIT.Regression.Dev11.dev11_10427.conv_ovf_i4.conv_ovf_i4.sh
./JIT.Regression.VS-ia64-JIT.V1.2-M02.b14364.b14364.b14364.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorFourThreadsBFI.CircularCctorFourThreadsBFI.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorThreeThreads01BFI.CircularCctorThreeThreads01BFI.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorThreeThreads02BFI.CircularCctorThreeThreads02BFI.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorThreeThreads03BFI.CircularCctorThreeThreads03BFI.sh

The failure signatures are not exactly what I remember. I will attempt to debug a shorter one to see if I can root cause the issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the failures are a regression on the tip . When the regression is fixed, looks like I may be able to remove this workaround from the [Arm64/Unix] assembly.

EPILOG_RESTORE_REG_PAIR x21, x22, 32
EPILOG_RESTORE_REG_PAIR x23, x24, 48
EPILOG_RESTORE_REG_PAIR x25, x26, 64
EPILOG_RESTORE_REG_PAIR x27, x28, 80
EPILOG_RESTORE_REG_PAIR fp, lr, #16!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be EPILOG_RESTORE_REG_PAIR fp, lr, #96!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahku Yes you are right. Show how dependent I am on compile test workflow....

@rahku
Copy link

rahku commented Mar 17, 2017

@dotnet-bot test Windows_NT arm64 checked

@rahku
Copy link

rahku commented Mar 17, 2017

LGTM. Thanks

@danmoseley
Copy link
Member

@rahku what remains here? the CI shows an AV but with a bad stack.

@sdmaclea
Copy link
Author

@danmosemsft I do not have the ability to compile/test for arm64 windows. CI shows some error, but the logs are not sufficient for me to infer the root cause of the error. Someone with arm64 windows build/test will need to triage this and fix.

@danmoseley
Copy link
Member

Hopefully either @rahku or @dotnet/arm64-contrib can share the information.

Is cross-building arm64 on Windows not supported locally? I didn't try this.

@rahku
Copy link

rahku commented Mar 23, 2017

I will take a look today.

; sp in fp. If sp is saved in fp in prolog then it is not expected that fp can change in the body
; of method. However, this method needs to be able to change fp before calling funclet.
; This is required to access locals in funclet.
PROLOG_SAVE_REG_PAIR x29,lr, #-96!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing test is causing an exception to happen in a finalizer (being executed as part of unwind). At that point OS is not able to unwind correctly out of CallEHFunction. This is because CallEHFunction modifies fp outside of the prolog of the function. CallEHFunction needs to be able to modify fp outside of the prolog to set the correct fp in order to get the locals for the finalizer. In order to make that happen prolog must not assume that fp can be used to unwind out of the function.
This line translates to
stp x29,lr [sp,#-0x60]!
mov fp,sp
So generated unwind codes assume that fp can be used to unwind out of the function. Therefor using x29/fp in this line is not correct. If you revert this line back to what it was previously then things work fine
PROLOG_SAVE_REG_PAIR x19,x20, #-96!
PROLOG_SAVE_REG fp, #0
PROLOG_SAVE_REG lr, #8

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Using x29 instead of FP was able to trick the unixmacro preprocessor to avoid that unwinding code. I do not see the Windows version of the macros, they must be part of the ksarm64.h header.

@rahku Do you want me to push the patch?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows has another macro called PROLOG_SAVE_REG_PAIR_NO_FP that could be used here instead of doing this trick. @rahku do you think it would make sense to use it instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that works too. I just verified that. I think this macro was not present when I did this work initially. @sdmaclea can you update this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@rahku
Copy link

rahku commented Mar 27, 2017

@dotnet-bot test Windows_NT arm64 Checked

@rahku
Copy link

rahku commented Mar 27, 2017

@jashook arm64 test runs are failing

@rahku
Copy link

rahku commented Mar 27, 2017

@dotnet-bot test OSX10.12 x64 Checked Build and Test

@rahku
Copy link

rahku commented Mar 27, 2017

@jashook you need to recompile test binaries as certain types are now internal in corelib and hence no longer accessible.

@jashook
Copy link

jashook commented Mar 27, 2017

@rahku yeah, I have a pr open addressing it; however, have been caught up with other things.

@jashook
Copy link

jashook commented Apr 3, 2017

@dotnet-bot test Windows_NT arm64 Checked

@jashook
Copy link

jashook commented Apr 4, 2017

ping @rahku

@rahku rahku merged commit 7f296fe into dotnet:master Apr 4, 2017
@rahku
Copy link

rahku commented Apr 4, 2017

Thanks for fixing this.

@sdmaclea sdmaclea deleted the PR-ARM64-BACKPORT-9500 branch April 4, 2017 22:38
hadibrais pushed a commit to hadibrais/coreclr that referenced this pull request Apr 7, 2017
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants