-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Arm64/Windows] Backport #9500 changes #10190
Conversation
Fixes #9526
@dotnet-bot test Windows_NT arm64 Cross Debug Build |
@RussKeldorph dotnet bot seems to ignore my requests to test this for windows arm64. Can I get permission to trigger windows CI arm64 builds. @dotnet/arm64-contrib @janvorli @jashook PTAL These represent changes that were added to the [Arm64/Unix] platform code to fix failing tests. It seems they should also be in the Windows platform. As I cannot trigger a CI build or compile locally...
|
@sdmaclea Best we can do is have you mention @dotnet/arm64-contrib to have someone review and request on your behalf. @dotnet-bot test Windows_NT arm64 Cross Debug Build |
@RussKeldorph Thanks Build is failing, but console output is insufficient to see why/which assert is firing. I do not think I can fix the patch without being able to build locally. @dotnet/arm64-contrib PTAL |
@@ -76,6 +77,8 @@ LNoFloatingPoint | |||
ldr x9, [x19,#CallDescrData__pTarget] | |||
blr x9 | |||
|
|||
ldr x19, [fp, 24] ; Fixup corrupted X19 callee preserved register |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x19 is callee saved so it should not be trashed on return from call. Can you explain who/when is x19 getting corrupted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a workaround for unexpected x19 corruption. I had tried to debug this failure by storing the correct value for x19 on the stack above and adding a conditional break, but was unable to do so. The test seemed to fail to make forward progress.
Let me try to reproduce. I may have better luck as the GC is more stable now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing the equivalent of this line, there are 43 failures on [Arm64/Unix] tip.
./JIT.Directed.Misc.function_pointer.MutualThdRecur-fptr.MutualThdRecur-fptr.sh
./JIT.Directed.tailcall.tailcall.tailcall.sh
./JIT.Methodical.Boxing.misc._dbgconcurgc_il._dbgconcurgc_il.sh
./JIT.Methodical.Boxing.misc._relconcurgc_il._relconcurgc_il.sh
./JIT.Methodical.switch.switch10.switch10.sh
./JIT.Methodical.switch.switch11.switch11.sh
./JIT.Methodical.switch.switch1.switch1.sh
./JIT.Methodical.switch.switch2.switch2.sh
./JIT.Methodical.switch.switch3.switch3.sh
./JIT.Methodical.switch.switch4.switch4.sh
./JIT.Methodical.switch.switch5.switch5.sh
./JIT.Methodical.switch.switch6.switch6.sh
./JIT.Methodical.switch.switch7.switch7.sh
./JIT.Methodical.switch.switch8.switch8.sh
./JIT.Methodical.switch.switch9.switch9.sh
./JIT.Methodical.tailcall._il_dbgrecurse_ep_void._il_dbgrecurse_ep_void.sh
./JIT.Methodical.tailcall._il_dbgtest_2a._il_dbgtest_2a.sh
./JIT.Methodical.tailcall._il_dbgtest_2b._il_dbgtest_2b.sh
./JIT.Methodical.tailcall._il_dbgtest_2c._il_dbgtest_2c.sh
./JIT.Methodical.tailcall._il_dbgtest_implicit._il_dbgtest_implicit.sh
./JIT.Methodical.tailcall._il_dbgtest_mutual_rec._il_dbgtest_mutual_rec.sh
./JIT.Methodical.tailcall._il_dbgtest_switch._il_dbgtest_switch.sh
./JIT.Methodical.tailcall._il_dbgtest_virt._il_dbgtest_virt.sh
./JIT.Methodical.tailcall._il_dbgtest_void._il_dbgtest_void.sh
./JIT.Methodical.tailcall._il_relrecurse_ep_void._il_relrecurse_ep_void.sh
./JIT.Methodical.tailcall._il_reltest_2a._il_reltest_2a.sh
./JIT.Methodical.tailcall._il_reltest_2b._il_reltest_2b.sh
./JIT.Methodical.tailcall._il_reltest_2c._il_reltest_2c.sh
./JIT.Methodical.tailcall._il_reltest_implicit._il_reltest_implicit.sh
./JIT.Methodical.tailcall._il_reltest_mutual_rec._il_reltest_mutual_rec.sh
./JIT.Methodical.tailcall._il_reltest_switch._il_reltest_switch.sh
./JIT.Methodical.tailcall._il_reltest_virt._il_reltest_virt.sh
./JIT.Methodical.tailcall._il_reltest_void._il_reltest_void.sh
./JIT.Regression.CLR-x86-JIT.V1.1-M1-Beta1.b143840.b143840.b143840.sh
./JIT.Regression.CLR-x86-JIT.V1-M12-Beta2.b65423.b65423.b65423.sh
./JIT.Regression.CLR-x86-JIT.V1-QFE.b151440.static-none.static-none.sh
./JIT.Regression.CLR-x86-JIT.V2.0-Beta2.b426654.b426654.b426654.sh
./JIT.Regression.Dev11.dev11_10427.conv_ovf_i4.conv_ovf_i4.sh
./JIT.Regression.VS-ia64-JIT.V1.2-M02.b14364.b14364.b14364.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorFourThreadsBFI.CircularCctorFourThreadsBFI.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorThreeThreads01BFI.CircularCctorThreeThreads01BFI.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorThreeThreads02BFI.CircularCctorThreeThreads02BFI.sh
./Loader.classloader.TypeInitialization.CircularCctors.CircularCctorThreeThreads03BFI.CircularCctorThreeThreads03BFI.sh
The failure signatures are not exactly what I remember. I will attempt to debug a shorter one to see if I can root cause the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the failures are a regression on the tip . When the regression is fixed, looks like I may be able to remove this workaround from the [Arm64/Unix] assembly.
src/vm/arm64/asmhelpers.asm
Outdated
EPILOG_RESTORE_REG_PAIR x21, x22, 32 | ||
EPILOG_RESTORE_REG_PAIR x23, x24, 48 | ||
EPILOG_RESTORE_REG_PAIR x25, x26, 64 | ||
EPILOG_RESTORE_REG_PAIR x27, x28, 80 | ||
EPILOG_RESTORE_REG_PAIR fp, lr, #16! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be EPILOG_RESTORE_REG_PAIR fp, lr, #96!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahku Yes you are right. Show how dependent I am on compile test workflow....
@dotnet-bot test Windows_NT arm64 checked |
LGTM. Thanks |
@rahku what remains here? the CI shows an AV but with a bad stack. |
@danmosemsft I do not have the ability to compile/test for arm64 windows. CI shows some error, but the logs are not sufficient for me to infer the root cause of the error. Someone with arm64 windows build/test will need to triage this and fix. |
Hopefully either @rahku or @dotnet/arm64-contrib can share the information. Is cross-building arm64 on Windows not supported locally? I didn't try this. |
I will take a look today. |
src/vm/arm64/asmhelpers.asm
Outdated
; sp in fp. If sp is saved in fp in prolog then it is not expected that fp can change in the body | ||
; of method. However, this method needs to be able to change fp before calling funclet. | ||
; This is required to access locals in funclet. | ||
PROLOG_SAVE_REG_PAIR x29,lr, #-96! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing test is causing an exception to happen in a finalizer (being executed as part of unwind). At that point OS is not able to unwind correctly out of CallEHFunction. This is because CallEHFunction modifies fp outside of the prolog of the function. CallEHFunction needs to be able to modify fp outside of the prolog to set the correct fp in order to get the locals for the finalizer. In order to make that happen prolog must not assume that fp can be used to unwind out of the function.
This line translates to
stp x29,lr [sp,#-0x60]!
mov fp,sp
So generated unwind codes assume that fp can be used to unwind out of the function. Therefor using x29/fp in this line is not correct. If you revert this line back to what it was previously then things work fine
PROLOG_SAVE_REG_PAIR x19,x20, #-96!
PROLOG_SAVE_REG fp, #0
PROLOG_SAVE_REG lr, #8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Using x29 instead of FP was able to trick the unixmacro preprocessor to avoid that unwinding code. I do not see the Windows version of the macros, they must be part of the ksarm64.h header.
@rahku Do you want me to push the patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows has another macro called PROLOG_SAVE_REG_PAIR_NO_FP that could be used here instead of doing this trick. @rahku do you think it would make sense to use it instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that works too. I just verified that. I think this macro was not present when I did this work initially. @sdmaclea can you update this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Debugged by Rahul Kumar (Thanks)
@dotnet-bot test Windows_NT arm64 Checked |
@jashook arm64 test runs are failing |
@dotnet-bot test OSX10.12 x64 Checked Build and Test |
@jashook you need to recompile test binaries as certain types are now internal in corelib and hence no longer accessible. |
@rahku yeah, I have a pr open addressing it; however, have been caught up with other things. |
@dotnet-bot test Windows_NT arm64 Checked |
ping @rahku |
Thanks for fixing this. |
[Arm64/Windows] Backport dotnet#9500 changes
Fixes #9526