-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arm64 KVM problem when SCS is enabled #1096
Comments
Sounds like there are functions used in EL2 that are missing the |
Yes, that sounds about right. I reverted 9654736 and replaced it with the v6 version then everything works fine so it seems like some function that runs at EL2 is missing
Yes, it does. I just need to get a serial to USB cable. |
Well the serial debugging cable I got does not appear to work or I am holding it wrong but I did some good old "disable it for this translation unit" debugging in the meantime and came down to switch.c being the problematic file. Every thing is fine with this diff: diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 2ca7ba69c318..b131b08cd63a 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -999,3 +999,4 @@ CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_FTRACE is not set
CONFIG_MEMTEST=y
+CONFIG_SHADOW_CALL_STACK=y
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 8c9880783839..d3acd087fb07 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -11,6 +11,8 @@ obj-$(CONFIG_KVM) += hyp.o
hyp-y := vgic-v3-sr.o timer-sr.o aarch32.o vgic-v2-cpuif-proxy.o sysreg-sr.o \
debug-sr.o entry.o switch.o fpsimd.o tlb.o hyp-entry.o
+CFLAGS_REMOVE_switch.o := $(CC_FLAGS_SCS)
+
# KVM code is run at a different exception code with a different map, so
# compiler instrumentation that inserts callbacks or checks into the code may
# cause crashes. Just disable it. Unfortunately, I tried adding |
For the serial debug cable, assuming you have it plugged into the right pins (and IIRC on the Pi there's a step you need to do to enable serial debug, test on a working kernel), then the host should see a new /dev/ttyUSB. I use |
Yeah you have to add |
Can you boot your host off a USB live image of linux? |
That's a good idea, I will try that soon. |
Unfortunately, same deal even with live Linux; I see |
The section on "UARTs and Device Tree" https://www.raspberrypi.org/documentation/configuration/uart.md makes it sound like bluetooth might have to be disabled. |
So I have |
Can you see if $ llvm-readelf -s arch/arm64/kvm/hyp/switch.o | grep FUNC is different with and without |
You can also run |
comparing w/ and w/o SCS
may also be of interest. |
I see |
|
$ rm -f arch/arm64/kvm/hyp/switch.o
$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make LLVM=1 -j71 arch/arm64/kvm/hyp/switch.o KCFLAGS="-Rpass=inline" 2>&1 | grep kvm_skip_instr might be able to tell you more info about the decision to inline or not. I would check I don't see |
Ugh, I am sorry, that's my fault. I missed 5c37f1a in my backport to 5.4. However, picking that does not solve anything, which lines up with the initial report, which was arm64 defconfig + Looks like there is now a 5.8 branch with all of the out of tree dtb stuff that allows me to disable Bluetooth easily and reclaim the primary UART for the serial console. I did manage to get the serial console "working" with a pure upstream kernel but it uses the mini UART, which I could not get to output anything other than garbage: I will probably reach out on the Raspberry Pi kernel mailing list to see if I can get some help with that. I did manage to get the panic information via serial console this time around:
Not super descriptive... but better than nothing I suppose. Here is that same information that Nick requested on mainline:
Here is the content of Without SCS: https://gist.github.com/a930e624d11ba94fd8f4f5f24542fd67 With SCS: https://gist.github.com/ea63c1dffd584618487442f4df970919 |
That's pretty common when the baud rate of the client is wrong. The client starts interpreting signals at the wrong rate, and thus interprets an otherwise valid signal as garbage. Comparing the list of defined symbols, I see: --- /noscs.txt 2020-07-21 16:14:52.746914000 -0700
+++ /scs.txt 2020-07-21 16:14:52.746914000 -0700
@@ -1,14 +1,27 @@
+__activate_traps
+__activate_traps_common
+__activate_traps_fpsimd32
__activate_traps_nvhe
activate_traps_vhe
activate_traps_vhe_load
+__activate_vm
__deactivate_traps
+__deactivate_traps_common
+deactivate_traps_vhe
deactivate_traps_vhe_put
fixup_guest_exit
+__fpsimd_save_fpexc32
__hyp_call_panic_nvhe
__hyp_call_panic_vhe
__hyp_handle_fpsimd
-__hyp_handle_ptrauth
hyp_panic
__kvm_vcpu_run_nvhe
-__kvm_vcpu_run_vhe
kvm_vcpu_run_vhe
+__set_guest_arch_workaround_state
+__set_host_arch_workaround_state
+sve_ffr_offset
+sve_pffr
+system_supports_address_auth
+system_supports_generic_auth
+update_fp_enabled
+vcpu_ptrauth_enable I wonder if all the |
Careful here -- anything that is "VHE only" (where the entire kernel runs with hypervisor privileges at EL2) doesn't need the |
I think I nailed this one with https://lore.kernel.org/kvm/20200722162231.3689767-1-maz@kernel.org/ |
I built
I will try this with mainline later, thanks for the fix! |
As I replied on the list, Marc's patch against mainline resolves the issue as well. |
Marc's patch is now in the KVM tree: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/commit/?id=bf4086b1a1efa3d3a2c17582e00bbd2176dfe177 |
This made it into 5.8: https://git.kernel.org/linus/bf4086b1a1efa3d3a2c17582e00bbd2176dfe177 |
Thanks to the hard work of upstream developers, the Raspberry Pi 4 can be easily booted on mainline, which is rather neat since I now have an actual piece of hardware that I can use to run mainline kernels on :)
One of the things I wanted to try was spawning a guest with KVM with a clang built kernel, as we have received a report of it not working when BTI was enabled: https://lore.kernel.org/linux-arm-kernel/20200615105524.GA2694@willie-the-truck/
It works fine when just building defconfig (which is how I verified ClangBuiltLinux/boot-utils#23):
However, as soon as I enable
CONFIG_SHADOW_CALL_STACK
, attempting to spawn a KVM guest kills the machine; I see theqemu-system-aarch64
but no other output then mymosh
session disconnects and the green light on the Pi stops flashing. I am unsure of how to get a previous kernel log on "regular" Linux (I know that Android has pstore) so I am not sure how to further debug this.I am going to do some research to see if this is a clang issue or more rooted in the kernel. Attempting to bisect probably won't prove fruitful for two reasons: SCS was only merged in 5.8-rc1 and Raspberry Pi 4 support has only been good for the past couple of kernel versions.
cc @samitolvanen
The text was updated successfully, but these errors were encountered: