-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVM host kernel crash when start PVM VM on AMD virtual machine #4
Comments
Thank you for debugging and fixing the issue. Your fix is correct. We apologize for not testing PVM on AMD for the new design before sending the RFC patches. |
…g when PCID is not supported Similar to the check performed in pvm_set_host_cr3_for_guest_with_host_pcid(), the HPA of the SPT page table for direct switching should also be verified in pvm_set_host_cr3_for_guest_without_host_pcid(). Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: #4
…g when PCID is not supported Similar to the check performed in pvm_set_host_cr3_for_guest_with_host_pcid(), the HPA of the SPT page table for direct switching should also be verified in pvm_set_host_cr3_for_guest_without_host_pcid(). Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: #4
Hi @bysui Thanks for confirming! Would you please add me into the sign-off, thanks! Signed-off-by: Yong He alexyonghe@tencent.com |
…g when PCID is not supported Similar to the check performed in pvm_set_host_cr3_for_guest_with_host_pcid(), the HPA of the SPT page table for direct switching should also be verified in pvm_set_host_cr3_for_guest_without_host_pcid(). Signed-off-by: Yong He <alexyonghe@tencent.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: #4
Hi @zhuangel, thank you for your fix. I apologize for forgetting to include your sign-off for the fix patch. I have updated the pvm branch to include your sign-off. However, our work is still in development. When we send out our new version patchset, the fix patch will be squashed into previous patches, and we will make sure to include your sign-off for the new patch . We appreciate your contribution. |
…g when PCID is not supported Similar to the check performed in pvm_set_host_cr3_for_guest_with_host_pcid(), the HPA of the SPT page table for direct switching should also be verified in pvm_set_host_cr3_for_guest_without_host_pcid(). Signed-off-by: Yong He <alexyonghe@tencent.com> Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Link: #4
… place apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set special as it optimizes the existing NOPs in place. Unfortunately, this happens with interrupts enabled and does not provide any form of core synchronization. So an interrupt hitting in the middle of the update and using the affected code path will observe a half updated NOP and crash and burn. The following 3 NOP sequence was observed to expose this crash halfway reliably under QEMU 32bit: 0x90 0x90 0x90 which is replaced by the optimized 3 byte NOP: 0x8d 0x76 0x00 So an interrupt can observe: 1) 0x90 0x90 0x90 nop nop nop 2) 0x8d 0x90 0x90 undefined 3) 0x8d 0x76 0x90 lea -0x70(%esi),%esi 4) 0x8d 0x76 0x00 lea 0x0(%esi),%esi Where only #1 and virt-pvm#4 are true NOPs. The same problem exists for 64bit obviously. Disable interrupts around this NOP optimization and invoke sync_core() before re-enabling them. Fixes: 270a69c ("x86/alternative: Support relocations in alternatives") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
Trying to suspend to RAM on SAMA5D27 EVK leads to the following lockdep warning: ============================================ WARNING: possible recursive locking detected 6.7.0-rc5-wt+ torvalds#532 Not tainted -------------------------------------------- sh/92 is trying to acquire lock: c3cf306c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100 but task is already holding lock: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by sh/92: #0: c3aa0258 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0xd8/0x178 #1: c4c2df44 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x138/0x284 virt-pvm#2: c32684a0 (kn->active){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x148/0x284 virt-pvm#3: c232b6d4 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x13c/0x4e8 virt-pvm#4: c387b088 (&dev->mutex){....}-{3:3}, at: __device_suspend+0x1e8/0x91c virt-pvm#5: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100 stack backtrace: CPU: 0 PID: 92 Comm: sh Not tainted 6.7.0-rc5-wt+ torvalds#532 Hardware name: Atmel SAMA5 unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x34/0x48 dump_stack_lvl from __lock_acquire+0x19ec/0x3a0c __lock_acquire from lock_acquire.part.0+0x124/0x2d0 lock_acquire.part.0 from _raw_spin_lock_irqsave+0x5c/0x78 _raw_spin_lock_irqsave from __irq_get_desc_lock+0xe8/0x100 __irq_get_desc_lock from irq_set_irq_wake+0xa8/0x204 irq_set_irq_wake from atmel_gpio_irq_set_wake+0x58/0xb4 atmel_gpio_irq_set_wake from irq_set_irq_wake+0x100/0x204 irq_set_irq_wake from gpio_keys_suspend+0xec/0x2b8 gpio_keys_suspend from dpm_run_callback+0xe4/0x248 dpm_run_callback from __device_suspend+0x234/0x91c __device_suspend from dpm_suspend+0x224/0x43c dpm_suspend from dpm_suspend_start+0x9c/0xa8 dpm_suspend_start from suspend_devices_and_enter+0x1e0/0xa84 suspend_devices_and_enter from pm_suspend+0x460/0x4e8 pm_suspend from state_store+0x78/0xe4 state_store from kernfs_fop_write_iter+0x1a0/0x284 kernfs_fop_write_iter from vfs_write+0x38c/0x6f4 vfs_write from ksys_write+0xd8/0x178 ksys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xc52b3fa8 to 0xc52b3ff0) 3fa0: 00000004 005a0ae8 00000001 005a0ae8 00000004 00000001 3fc0: 00000004 005a0ae8 00000001 00000004 00000004 b6c616c0 00000020 0059d190 3fe0: 00000004 b6c61678 aec5a041 aebf1a26 This warning is raised because pinctrl-at91-pio4 uses chained IRQ. Whenever a wake up source configures an IRQ through irq_set_irq_wake, it will lock the corresponding IRQ desc, and then call irq_set_irq_wake on "parent" IRQ which will do the same on its own IRQ desc, but since those two locks share the same class, lockdep reports this as an issue. Fix lockdep false positive by setting a different class for parent and children IRQ Fixes: 7761808 ("pinctrl: introduce driver for Atmel PIO4 controller") Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20231215-lockdep_warning-v1-1-8137b2510ed5@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Description
I try to run PVM VM on AMD Zen2 virtual machine (which PCID is disabled), when PVM VM boot into init, AMD virtual machine panic.
[34657.105528] general protection fault: 0000 [#1] PREEMPT SMP NOPTI
[34657.106109] CPU: 7 PID: 954129 Comm: vcpu0 Kdump: loaded Not tainted 6.7.0-rc6-pvm-alex+ #7
[34657.106730] Hardware name: <>, BIOS <>
[34657.107360] RIP: 0010:entry_SYSRETQ_switcher_unsafe_stack+0x55/0x134
[34657.107886] Code: 81 7f 40 33 00 2b 00 0f 85 a9 00 00 00 48 8b 4c 24 20 65 48 89 0d 27 27 20 7e 65 80 35 e7 26 20 7e 03 65 48 8b 0d ef 26 20 7e <0f> 22 d9 0f 01 f8 48 8b 4f 50 48 c1 e1 10 48 c1 f9 10 f3 48 0f ae
[34657.109266] RSP: 0018:fffffe000019ffd0 EFLAGS: 00010002
[34657.109694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
[34657.110224] RDX: 0000000000000001 RSI: 000000000005b000 RDI: ffff888207e02000
[34657.110752] RBP: 00007ffe8c827500 R08: 00007f5f0bf3b650 R09: 00007f5f0bfbae30
[34657.111279] R10: 00007f5f0b8f8000 R11: 0000000000000286 R12: 0000000000000000
[34657.111807] R13: 00007f5f0b903428 R14: 00007f5f0bee99a0 R15: 00007f5f0bee99a0
[34657.112339] FS: 00007f5f0b734400(0000) GS:ffff88881fbc0000(0000) knlGS:fffff0003ec00000
[34657.112930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34657.113362] CR2: 00007f5f0b92a05e CR3: 0000000163c35000 CR4: 00000000003506f0
[34657.113889] Call Trace:
[34657.114087] <ENTRY_TRAMPOLINE>
[34657.114337] ? show_regs+0x65/0x70
[34657.114608] ? __die_body+0x23/0x70
[34657.114881] ? die_addr+0x41/0x70
[34657.115146] ? exc_general_protection+0x1fd/0x440
[34657.115508] ? asm_exc_general_protection+0x36/0x60
[34657.115880] ? entry_SYSRETQ_switcher_unsafe_stack+0x55/0x134
[34657.116315] </ENTRY_TRAMPOLINE>
Step to reproduce
Build PVM host kernel and PVM guest kernel
Following the guide pvm-get-started-with-kata.md, install PVM host kernel in AMD Zen 2 virtual machine.
PVM VM resource from Guide
cloud-hypervisor v37
VM image from Guide
Start PVM VM
Start PVM VM on AMD Zen 2 virtual machine
cloud-hypervisor.v37
--api-socket ch.sock \
--log-file vmm.log \
--cpus boot=1 \
--kernel vmlinux.virt-pvm-guest \
--cmdline 'console=ttyS0 root=/dev/vda1 rw clocksource=kvm-clock pti=off' \
--memory size=1G,hugepages=off,shared=false,prefault=off \
--disk id=disk_0,path=ubuntu-22.04-pvm-kata.raw \
-v --console off --serial tty
The AMD Zen2 virtual machine panic
The crash is caused by switcher try to do fast switch,but using an invalid UMOD CR3 (RCX: ffffffffffffffff), from the debug message found the panic happens after PVM_HC_TLB_FLUSH_CURRENT hypercall, which free the prev_roots, but the fast switch failed to check state of prev_roots, then set an invalid UMOD CR3, and make AMD Zen2 virtual machine panic in PVM switcher code.
[34657.105488] kvm_pvm: vcpu 0 reason 0x20000 rip 0xffffd97f818efcd9 info1 0x0000000000000400 info2 0x0000000000000000 intr_info 0x00000400 error_code 0x00000000
[34657.105490] kvm_pvm: handle_exit_syscall 2112 17088204
[34657.105518] kvm_pvm: entry smod_cr3 163c35000 umod_cr3 ffffffffffffffff
[34657.105528] general protection fault: 0000 [#1] PREEMPT SMP NOPTI
Maybe the fix
So there should be valid check for prev_root, like check_switch_cr3, so my fix is like this, which could boot PVM VM on AMD virtual machine successfully.
--- a/arch/x86/kvm/pvm/pvm.c
+++ b/arch/x86/kvm/pvm/pvm.c
@@ -765,8 +765,9 @@ static void pvm_set_host_cr3_for_guest_without_host_pcid(struct vcpu_pvm *pvm)
{
u64 root_hpa = pvm->vcpu.arch.mmu->root.hpa;
u64 switch_root = 0;
+ u64 prev_root_hpa = pvm->vcpu.arch.mmu->prev_roots[0].hpa;
- if (pvm->vcpu.arch.mmu->prev_roots[0].pgd == pvm->msr_switch_cr3) {
+ if (VALID_PAGE(prev_root_hpa) && (pvm->vcpu.arch.mmu->prev_roots[0].pgd == pvm->msr_switch_cr3)) {
switch_root = pvm->vcpu.arch.mmu->prev_roots[0].hpa;
pvm->switch_flags &= ~SWITCH_FLAGS_NO_DS_CR3;
} else {
The text was updated successfully, but these errors were encountered: