Skip to content

Commit

Permalink
Merge branches 'fixes', 'generic', 'misc', 'mmu', 'mtrrs', 'pmu', 'se…
Browse files Browse the repository at this point in the history
…lftests' and 'svm'

* fixes:
  KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()
  KVM: selftests: x86: Prioritize getting max_gfn from GuestPhysBits
  KVM: selftests: Fix shift of 32 bit unsigned int more than 32 bits

* generic:
  KVM: X86: improve documentation for KVM_CAP_X86_BUS_LOCK_EXIT
  KVM: fix documentation rendering for KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
  Revert "KVM: async_pf: avoid recursive flushing of work items"
  KVM: Update halt polling documentation to note that KVM has 4 module params
  KVM: Enable halt polling shrink parameter by default
  KVM: Unexport kvm_debugfs_dir

* misc:
  KVM: x86: Keep consistent naming for APICv/AVIC inhibit reasons
  KVM: x86: Print names of apicv inhibit reasons in traces
  KVM: x86: Add a capability to configure bus frequency for APIC timer
  KVM: x86: Make nanoseconds per APIC bus cycle a VM variable
  KVM: x86: hyper-v: Calculate APIC bus frequency for Hyper-V
  KVM: x86: Move shadow_phys_bits into "kvm_host", as "maxphyaddr"
  KVM: x86/mmu: Snapshot shadow_phys_bits when kvm.ko is loaded
  KVM: SVM: Use KVM's snapshot of the host's XCR0 for SEV-ES host state
  KVM: x86: Add a struct to consolidate host values, e.g. EFER, XCR0, etc...

* mmu:
  KVM: x86/mmu: Only allocate shadowed translation cache for sp->role.level <= KVM_MAX_HUGEPAGE_LEVEL
  KVM: x86: invalid_list not used anymore in mmu_shrink_scan

* mtrrs:
  KVM: VMX: Always honor guest PAT on CPUs that support self-snoop
  KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path
  srcu: Add an API for a memory barrier after SRCU read lock
  KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1
  KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes

* pmu:
  KVM: x86/pmu: Manipulate FIXED_CTR_CTRL MSR with macros
  KVM: x86/pmu: Change ambiguous _mask suffix to _rsvd in kvm_pmu
  KVM: VMX: Switch to new Intel CPU model infrastructure
  KVM: x86/pmu: Switch to new Intel CPU model defines
  KVM: x86: Remove IA32_PERF_GLOBAL_OVF_CTRL from KVM_GET_MSR_INDEX_LIST

* selftests:
  KVM: selftests: remove unused struct 'memslot_antagonist_args'

* svm:
  KVM: SVM: Consider NUMA affinity when allocating per-CPU save_area
  KVM: SVM: not account memory allocation for per-CPU svm_data
  KVM: SVM: remove useless input parameter in snp_safe_alloc_page
  • Loading branch information
sean-jc committed Jun 5, 2024
9 parents f99b052 + 49f683b + d3f673c + f992572 + 9ecc1c1 + 95200f2 + 75430c4 + f626279 + 99a4909 commit af0903a
Show file tree
Hide file tree
Showing 36 changed files with 412 additions and 929 deletions.
75 changes: 49 additions & 26 deletions Documentation/virt/kvm/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6416,9 +6416,9 @@ More architecture-specific flags detailing state of the VCPU that may
affect the device's behavior. Current defined flags::

/* x86, set if the VCPU is in system management mode */
#define KVM_RUN_X86_SMM (1 << 0)
#define KVM_RUN_X86_SMM (1 << 0)
/* x86, set if bus lock detected in VM */
#define KVM_RUN_BUS_LOCK (1 << 1)
#define KVM_RUN_X86_BUS_LOCK (1 << 1)
/* arm64, set for KVM_EXIT_DEBUG */
#define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0)

Expand Down Expand Up @@ -7764,29 +7764,31 @@ Valid bits in args[0] are::
#define KVM_BUS_LOCK_DETECTION_OFF (1 << 0)
#define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1)

Enabling this capability on a VM provides userspace with a way to select
a policy to handle the bus locks detected in guest. Userspace can obtain
the supported modes from the result of KVM_CHECK_EXTENSION and define it
through the KVM_ENABLE_CAP.
Enabling this capability on a VM provides userspace with a way to select a
policy to handle the bus locks detected in guest. Userspace can obtain the
supported modes from the result of KVM_CHECK_EXTENSION and define it through
the KVM_ENABLE_CAP. The supported modes are mutually-exclusive.

KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported
currently and mutually exclusive with each other. More bits can be added in
the future.
This capability allows userspace to force VM exits on bus locks detected in the
guest, irrespective whether or not the host has enabled split-lock detection
(which triggers an #AC exception that KVM intercepts). This capability is
intended to mitigate attacks where a malicious/buggy guest can exploit bus
locks to degrade the performance of the whole system.

With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits
so that no additional actions are needed. This is the default mode.
If KVM_BUS_LOCK_DETECTION_OFF is set, KVM doesn't force guest bus locks to VM
exit, although the host kernel's split-lock #AC detection still applies, if
enabled.

With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected
in VM. KVM just exits to userspace when handling them. Userspace can enforce
its own throttling or other policy based mitigations.
If KVM_BUS_LOCK_DETECTION_EXIT is set, KVM enables a CPU feature that ensures
bus locks in the guest trigger a VM exit, and KVM exits to userspace for all
such VM exits, e.g. to allow userspace to throttle the offending guest and/or
apply some other policy-based mitigation. When exiting to userspace, KVM sets
KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
to KVM_EXIT_X86_BUS_LOCK.

This capability is aimed to address the thread that VM can exploit bus locks to
degree the performance of the whole system. Once the userspace enable this
capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the
KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning
the bus lock vm exit can be preempted by a higher priority VM exit, the exit
notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons.
KVM_RUN_BUS_LOCK flag is used to distinguish between them.
Note! Detected bus locks may be coincident with other exits to userspace, i.e.
KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if
userspace wants to take action on all detected bus locks.

7.23 KVM_CAP_PPC_DAWR1
----------------------
Expand Down Expand Up @@ -7902,10 +7904,10 @@ perform a bulk copy of tags to/from the guest.
7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
-------------------------------------

Architectures: x86 SEV enabled
Type: vm
Parameters: args[0] is the fd of the source vm
Returns: 0 on success
:Architectures: x86 SEV enabled
:Type: vm
:Parameters: args[0] is the fd of the source vm
:Returns: 0 on success

This capability enables userspace to migrate the encryption context from the VM
indicated by the fd to the VM this is called on.
Expand Down Expand Up @@ -7953,7 +7955,11 @@ The valid bits in cap.args[0] are:
When this quirk is disabled, the reset value
is 0x10000 (APIC_LVT_MASKED).

KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW.
KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on
AMD CPUs to workaround buggy guest firmware
that runs in perpetuity with CR0.CD, i.e.
with caches in "no fill" mode.

When this quirk is disabled, KVM does not
change the value of CR0.CD and CR0.NW.

Expand Down Expand Up @@ -8070,6 +8076,23 @@ error/annotated fault.

See KVM_EXIT_MEMORY_FAULT for more information.

7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS
-----------------------------------

:Architectures: x86
:Target: VM
:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds
:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
frequency or if any vCPUs have been created, -ENXIO if a virtual
local APIC has not been created using KVM_CREATE_IRQCHIP.

This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel
virtual APIC when emulating APIC timers. KVM's default value can be retrieved
by KVM_CHECK_EXTENSION.

Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.

8. Other capabilities.
======================

Expand Down
12 changes: 6 additions & 6 deletions Documentation/virt/kvm/halt-polling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,11 @@ adjustment of the polling interval.
Module Parameters
=================

The kvm module has 3 tuneable module parameters to adjust the global max
polling interval as well as the rate at which the polling interval is grown and
shrunk. These variables are defined in include/linux/kvm_host.h and as module
parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
powerpc kvm-hv case.
The kvm module has 4 tunable module parameters to adjust the global max polling
interval, the initial value (to grow from 0), and the rate at which the polling
interval is grown and shrunk. These variables are defined in
include/linux/kvm_host.h and as module parameters in virt/kvm/kvm_main.c, or
arch/powerpc/kvm/book3s_hv.c in the powerpc kvm-hv case.

+-----------------------+---------------------------+-------------------------+
|Module Parameter | Description | Default Value |
Expand All @@ -105,7 +105,7 @@ powerpc kvm-hv case.
| | grow_halt_poll_ns() | |
| | function. | |
+-----------------------+---------------------------+-------------------------+
|halt_poll_ns_shrink | The value by which the | 0 |
|halt_poll_ns_shrink | The value by which the | 2 |
| | halt polling interval is | |
| | divided in the | |
| | shrink_halt_poll_ns() | |
Expand Down
18 changes: 18 additions & 0 deletions Documentation/virt/kvm/x86/errata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,21 @@ have the same physical APIC ID, KVM will deliver events targeting that APIC ID
only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
matching the target APIC ID receive the interrupt).

MTRRs
-----
KVM does not virtualize guest MTRR memory types. KVM emulates accesses to MTRR
MSRs, i.e. {RD,WR}MSR in the guest will behave as expected, but KVM does not
honor guest MTRRs when determining the effective memory type, and instead
treats all of guest memory as having Writeback (WB) MTRRs.

CR0.CD
------
KVM does not virtualize CR0.CD on Intel CPUs. Similar to MTRR MSRs, KVM
emulates CR0.CD accesses so that loads and stores from/to CR0 behave as
expected, but setting CR0.CD=1 has no impact on the cachaeability of guest
memory.

Note, this erratum does not affect AMD CPUs, which fully virtualize CR0.CD in
hardware, i.e. put the CPU caches into "no fill" mode when CR0.CD=1, even when
running in the guest.
48 changes: 30 additions & 18 deletions arch/x86/include/asm/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,6 @@
#define KVM_MIN_FREE_MMU_PAGES 5
#define KVM_REFILL_PAGES 25
#define KVM_MAX_CPUID_ENTRIES 256
#define KVM_NR_FIXED_MTRR_REGION 88
#define KVM_NR_VAR_MTRR 8

#define ASYNC_PF_PER_VCPU 64
Expand Down Expand Up @@ -547,12 +546,12 @@ struct kvm_pmu {
unsigned nr_arch_fixed_counters;
unsigned available_event_types;
u64 fixed_ctr_ctrl;
u64 fixed_ctr_ctrl_mask;
u64 fixed_ctr_ctrl_rsvd;
u64 global_ctrl;
u64 global_status;
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u64 global_status_mask;
u64 global_ctrl_rsvd;
u64 global_status_rsvd;
u64 reserved_bits;
u64 raw_event_mask;
struct kvm_pmc gp_counters[KVM_INTEL_PMC_MAX_GENERIC];
Expand All @@ -572,9 +571,9 @@ struct kvm_pmu {

u64 ds_area;
u64 pebs_enable;
u64 pebs_enable_mask;
u64 pebs_enable_rsvd;
u64 pebs_data_cfg;
u64 pebs_data_cfg_mask;
u64 pebs_data_cfg_rsvd;

/*
* If a guest counter is cross-mapped to host counter with different
Expand Down Expand Up @@ -605,18 +604,12 @@ enum {
KVM_DEBUGREG_WONT_EXIT = 2,
};

struct kvm_mtrr_range {
u64 base;
u64 mask;
struct list_head node;
};

struct kvm_mtrr {
struct kvm_mtrr_range var_ranges[KVM_NR_VAR_MTRR];
mtrr_type fixed_ranges[KVM_NR_FIXED_MTRR_REGION];
u64 var[KVM_NR_VAR_MTRR * 2];
u64 fixed_64k;
u64 fixed_16k[2];
u64 fixed_4k[8];
u64 deftype;

struct list_head head;
};

/* Hyper-V SynIC timer */
Expand Down Expand Up @@ -1208,7 +1201,7 @@ enum kvm_apicv_inhibit {
* APIC acceleration is disabled by a module parameter
* and/or not supported in hardware.
*/
APICV_INHIBIT_REASON_DISABLE,
APICV_INHIBIT_REASON_DISABLED,

/*
* APIC acceleration is inhibited because AutoEOI feature is
Expand Down Expand Up @@ -1278,8 +1271,27 @@ enum kvm_apicv_inhibit {
* mapping between logical ID and vCPU.
*/
APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,

NR_APICV_INHIBIT_REASONS,
};

#define __APICV_INHIBIT_REASON(reason) \
{ BIT(APICV_INHIBIT_REASON_##reason), #reason }

#define APICV_INHIBIT_REASONS \
__APICV_INHIBIT_REASON(DISABLED), \
__APICV_INHIBIT_REASON(HYPERV), \
__APICV_INHIBIT_REASON(ABSENT), \
__APICV_INHIBIT_REASON(BLOCKIRQ), \
__APICV_INHIBIT_REASON(PHYSICAL_ID_ALIASED), \
__APICV_INHIBIT_REASON(APIC_ID_MODIFIED), \
__APICV_INHIBIT_REASON(APIC_BASE_MODIFIED), \
__APICV_INHIBIT_REASON(NESTED), \
__APICV_INHIBIT_REASON(IRQWIN), \
__APICV_INHIBIT_REASON(PIT_REINJ), \
__APICV_INHIBIT_REASON(SEV), \
__APICV_INHIBIT_REASON(LOGICAL_ID_ALIASED)

struct kvm_arch {
unsigned long n_used_mmu_pages;
unsigned long n_requested_mmu_pages;
Expand Down Expand Up @@ -1365,6 +1377,7 @@ struct kvm_arch {

u32 default_tsc_khz;
bool user_set_tsc;
u64 apic_bus_cycle_ns;

seqcount_raw_spinlock_t pvclock_sc;
bool use_master_clock;
Expand Down Expand Up @@ -1857,7 +1870,6 @@ struct kvm_arch_async_pf {
};

extern u32 __read_mostly kvm_nr_uret_msrs;
extern u64 __read_mostly host_efer;
extern bool __read_mostly allow_smaller_maxphyaddr;
extern bool __read_mostly enable_apicv;
extern struct kvm_x86_ops kvm_x86_ops;
Expand Down
3 changes: 2 additions & 1 deletion arch/x86/kvm/hyperv.c
Original file line number Diff line number Diff line change
Expand Up @@ -1737,7 +1737,8 @@ static int kvm_hv_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata,
data = (u64)vcpu->arch.virtual_tsc_khz * 1000;
break;
case HV_X64_MSR_APIC_FREQUENCY:
data = APIC_BUS_FREQUENCY;
data = div64_u64(1000000000ULL,
vcpu->kvm->arch.apic_bus_cycle_ns);
break;
default:
kvm_pr_unimpl_rdmsr(vcpu, msr);
Expand Down
6 changes: 4 additions & 2 deletions arch/x86/kvm/lapic.c
Original file line number Diff line number Diff line change
Expand Up @@ -1557,7 +1557,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
remaining = 0;

ns = mod_64(ktime_to_ns(remaining), apic->lapic_timer.period);
return div64_u64(ns, (APIC_BUS_CYCLE_NS * apic->divide_count));
return div64_u64(ns, (apic->vcpu->kvm->arch.apic_bus_cycle_ns *
apic->divide_count));
}

static void __report_tpr_access(struct kvm_lapic *apic, bool write)
Expand Down Expand Up @@ -1973,7 +1974,8 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)

static inline u64 tmict_to_ns(struct kvm_lapic *apic, u32 tmict)
{
return (u64)tmict * APIC_BUS_CYCLE_NS * (u64)apic->divide_count;
return (u64)tmict * apic->vcpu->kvm->arch.apic_bus_cycle_ns *
(u64)apic->divide_count;
}

static void update_target_expiration(struct kvm_lapic *apic, uint32_t old_divisor)
Expand Down
3 changes: 1 addition & 2 deletions arch/x86/kvm/lapic.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@
#define APIC_DEST_NOSHORT 0x0
#define APIC_DEST_MASK 0x800

#define APIC_BUS_CYCLE_NS 1
#define APIC_BUS_FREQUENCY (1000000000ULL / APIC_BUS_CYCLE_NS)
#define APIC_BUS_CYCLE_NS_DEFAULT 1

#define APIC_BROADCAST 0xFF
#define X2APIC_BROADCAST 0xFFFFFFFFul
Expand Down
34 changes: 2 additions & 32 deletions arch/x86/kvm/mmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,6 @@ static __always_inline u64 rsvd_bits(int s, int e)
return ((2ULL << (e - s)) - 1) << s;
}

/*
* The number of non-reserved physical address bits irrespective of features
* that repurpose legal bits, e.g. MKTME.
*/
extern u8 __read_mostly shadow_phys_bits;

static inline gfn_t kvm_mmu_max_gfn(void)
{
/*
Expand All @@ -76,30 +70,11 @@ static inline gfn_t kvm_mmu_max_gfn(void)
* than hardware's real MAXPHYADDR. Using the host MAXPHYADDR
* disallows such SPTEs entirely and simplifies the TDP MMU.
*/
int max_gpa_bits = likely(tdp_enabled) ? shadow_phys_bits : 52;
int max_gpa_bits = likely(tdp_enabled) ? kvm_host.maxphyaddr : 52;

return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
}

static inline u8 kvm_get_shadow_phys_bits(void)
{
/*
* boot_cpu_data.x86_phys_bits is reduced when MKTME or SME are detected
* in CPU detection code, but the processor treats those reduced bits as
* 'keyID' thus they are not reserved bits. Therefore KVM needs to look at
* the physical address bits reported by CPUID.
*/
if (likely(boot_cpu_data.extended_cpuid_level >= 0x80000008))
return cpuid_eax(0x80000008) & 0xff;

/*
* Quite weird to have VMX or SVM but not MAXPHYADDR; probably a VM with
* custom CPUID. Proceed with whatever the kernel found since these features
* aren't virtualizable (SME/SEV also require CPUIDs higher than 0x80000008).
*/
return boot_cpu_data.x86_phys_bits;
}

u8 kvm_mmu_get_max_tdp_level(void);

void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
Expand Down Expand Up @@ -246,12 +221,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
return -(u32)fault & errcode;
}

bool __kvm_mmu_honors_guest_mtrrs(bool vm_has_noncoherent_dma);

static inline bool kvm_mmu_honors_guest_mtrrs(struct kvm *kvm)
{
return __kvm_mmu_honors_guest_mtrrs(kvm_arch_has_noncoherent_dma(kvm));
}
bool kvm_mmu_may_ignore_guest_pat(void);

int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);

Expand Down
Loading

0 comments on commit af0903a

Please sign in to comment.