neonvm: Add support for virtio-mem #970

sharnoff · 2024-06-18T05:58:21Z

Adapted from #705.

This implementation is focused on making virtio-mem a drop-in replacement for our existing DIMM hotplug -based memory scaling.

As such, the externally visible differences are:

VirtualMachines have a new optional field: .spec.guest.memoryProvider
VirtualMachines have a new optional field: .status.memoryProvider
VirtualMachine .status.memorySize changes in smaller increments if using virtio-mem.

This change introduces the concept of a "memory provider", which is either "DIMMSlots" or "VirtioMem".

If a VM is created without setting the memory provider in the spec, the controller will use the default one, which is specified by its --default-memory-provider flag.
This will only persist for the lifetime of the runner pod. If the VM is restarted, then it may change to a new default.

Additionally, because our usage of virtio-mem has 8MiB block sizes, we actually don't default to virtio-mem if the VM was created with some odd memory slot size that isn't divisible by 8MiB.

The memory provider of the current runner pod is stored in the new status field. For old VMs created before this change, the status field is set to DIMMSlots iff the pod name is set without the status field.

Also, we allow mutating .spec.guest.memoryProvider IFF it is being set equal to the current .status.memoryProvider, in case we we want old VMs to keep their dimm slots after switching to virtio-mem by default.

Changes to split out from this PR:

On top of those, there's a couple places where code was indented that should probably be split into its own commit so it can be properly added to .git-blame-ignore-revs :)

Remaining testing for this PR before merging:

Test that backwards compatibility works as expected
- Old VMs are updated while running as described above
- Old VMs switch to the new default on restart
Test rollback safety: that it's ok to go between the last release and this PR (with --default-memory-provider=DIMMSlots).
- Note: If --default-memory-provider=VirtioMem when this is released, we cannot make it rollback-safe.
~~Test that updating .spec.guest.memoryProvider works as expected~~

Future/follow-up work:

neonvm: Use memory_hotplug.online_policy=auto-movable for virtio-mem #981
Switch to virtio-mem by default
Remove support for DIMM hotplug
Move from representing memory as "memory slots" towards the implementation
Fix the raciness issue with migration target pod creation.

This already was de-facto the case, which made much of the logic around handling "but what if .max is nil?" somewhat hard to reason about. Extracted from #970 and fleshed out here. All our staging & production clusters have VMs meeting this criteria, with the notable exception of a single, already-broken VM in dev-us-east-2-beta. More here: https://neondb.slack.com/archives/C03TN5G758R/p1718742308251569 --- Checked each cluster with: kubectl get neonvm -o json \ | jq -r '.items[] | select(.spec.guest | [.cpus.max,.cpus.min,.memorySlots.max,.memorySlots.min] | map(select(. == null)) | (length != 0)) | .metadata.name'

Omrigan

Just a couple thoughts

neonvm/apis/neonvm/v1/virtualmachine_webhook.go

neonvm/apis/neonvm/v1/virtualmachine_types.go

neonvm/controllers/vm_qmp_queries.go

Fixes #967, extracted from #970. In short, the state machine becomes simpler if we only set the VM's pod name when it's in Pending. This gives us a couple things: 1. When the VM is Failed or Succeeded, we can un-set the pod name and it will be stable. This is what fixes #967 2. It gives us a consistent place to set status fields relating to a particular pod *before* creating the pod. This is why it's helpful for #970

Extracted from #970. Currently, when leaving the Scaling phase, we emit events like MemoryInfo VirtualMachine vm-foo uses 4GiB memory CpuInfo VirtualMachine vm-foo uses 1 cpu cores To provide more up-to-date information in the event log, we can emit these events every time .status.{cpus,memorySize} changes. This is relevant for #970 because with virtio-mem, the current state of the memory is performed asychronously and so otherwise wouldn't be available in the logs.

This already was de-facto the case, which made much of the logic around handling "but what if .max is nil?" somewhat tricky to reason about. Extracted from #970 and fleshed out here. All our staging & production clusters have VMs meeting this criteria, with the notable exception of a single, already-broken VM in dev-us-east-2-beta. More here: https://neondb.slack.com/archives/C03TN5G758R/p1718742308251569 --- Checked each cluster with: kubectl get neonvm -o json \ | jq -r '.items[] | select(.spec.guest | [.cpus.max,.cpus.min,.memorySlots.max,.memorySlots.min] | map(select(. == null)) | (length != 0)) | .metadata.name'

Fixes #967, extracted from #970. In short, the state machine becomes simpler if we only set the VM's pod name when it's in Pending. This gives us a couple things: 1. When the VM is Failed or Succeeded, we can un-set the pod name and it will be stable. This is what fixes #967 2. It gives us a consistent place to set status fields relating to a particular pod *before* creating the pod. This is why it's helpful for #970

Extracted from #970. Currently, when leaving the Scaling phase, we emit events like MemoryInfo VirtualMachine vm-foo uses 4GiB memory CpuInfo VirtualMachine vm-foo uses 1 cpu cores To provide more up-to-date information in the event log, we can emit these events every time .status.{cpus,memorySize} changes. This is relevant for #970 because with virtio-mem, the current state of the memory is performed asychronously and so otherwise wouldn't be available in the logs.

sharnoff · 2024-06-19T21:12:39Z

Cleaned up the commits in this PR -- added single commits for #973 and #974, and split some indenting into a separate commit. May move more indenting, need to see.

Anyways, hopefully should make the final commit easier to review 🤞

Fixes #967, extracted from #970. In short, the state machine becomes simpler if we only set the VM's pod name when it's in Pending. This gives us a couple things: 1. When the VM is Failed or Succeeded, we can un-set the pod name and it will be stable. This is what fixes #967 2. It gives us a consistent place to set status fields relating to a particular pod *before* creating the pod. This is why it's helpful for #970

Extracted from #970. Currently, when leaving the Scaling phase, we emit events like MemoryInfo VirtualMachine vm-foo uses 4GiB memory CpuInfo VirtualMachine vm-foo uses 1 cpu cores To provide more up-to-date information in the event log, we can emit these events every time .status.{cpus,memorySize} changes. This is relevant for #970 because with virtio-mem, the current state of the memory is performed asychronously and so otherwise wouldn't be available in the logs.

sharnoff · 2024-06-21T06:32:05Z

Did backwards-compatibility testing locally (started a VM, upgraded the version, checked status was as expected, assuming that our existing e2e tests can take it from there). Found & fixed a bug.
Also checked rollback safety; yeah, seems fine :)
One interesting thing, posted it here: https://neondb.slack.com/archives/C039YKBRZB4/p1718950172991649

Omrigan

Looks good overall

neonvm/apis/neonvm/v1/virtualmachine_types.go

neonvm/controllers/vm_controller.go

Omrigan · 2024-06-21T16:00:41Z

neonvm/controllers/vm_controller.go

+			// Maybe we're already using the amount we want?
+			// Update the status to reflect the current size - and if it matches goalTotalSize, ram
+			// is done.
+			currentTotalSize, err := QmpGetMemorySize(QmpAddr(vm))


I think this is a redundant check, we could get the current memory size from where it is set.

But no worries, I'll clean it up in #943

I think in testing I actually saw this value change in between QMP queries in the same reconcile cycle. Not totally sure though...

neonvm/controllers/vm_controller.go

Omrigan · 2024-06-21T16:02:38Z

neonvm/controllers/vm_controller.go

+	if p := vm.Spec.Guest.MemoryProvider; p != nil {
+		return *p
+	}
+	if p := vm.Status.MemoryProvider; p != nil {


Won't this prevent us from changing from one provider to the other by deleting the pod, because the old value be preserved through Status?

the tl;dr on that is that we reset .status.memoryProvider in the same place we reset .status.podName (there's some method on VirtualMachineStatus that does it for us).

I tested it locally and it works as expected :)

neonvm/controllers/vm_qmp_queries.go

tests/e2e/autoscaling.virtio-mem/01-assert.yaml

Extracted from #970. Currently, when leaving the Scaling phase, we emit events like MemoryInfo VirtualMachine vm-foo uses 4GiB memory CpuInfo VirtualMachine vm-foo uses 1 cpu cores To provide more up-to-date information in the event log, we can emit these events every time .status.{cpus,memorySize} changes. This is relevant for #970 because with virtio-mem, the current state of the memory is performed asychronously and so otherwise wouldn't be available in the logs.

Change in preparation to reduce the diff from adding virtio-mem.

Adapted from #705. This implementation is focused on making virtio-mem a drop-in replacement for our existing DIMM hotplug -based memory scaling. As such, the externally visible differences are: 1. VirtualMachines have a new optional field: .spec.guest.memoryProvider 2. VirtualMachines have a new optional field: .status.memoryProvider 3. VirtualMachine .status.memorySize changes in smaller increments if using virtio-mem. This change introduces the concept of a "memory provider", which is either "DIMMSlots" or "VirtioMem". If a VM is created without setting the memory provider in the spec, the controller will use the default one, which is specified by its --default-memory-provider flag. This will only persist for the lifetime of the runner pod. If the VM is restarted, then it may change to a new default. Additionally, because our usage of virtio-mem has 8MiB block sizes, we actually *don't* default to virtio-mem if the VM was created with some odd memory slot size that isn't divisible by 8MiB. The memory provider of the current runner pod is stored in the new status field. For old VMs created before this change, the status field is set to DIMMSlots iff the pod name is set without the status field. Also, we allow mutating .spec.guest.memoryProvider to allow the flexibility to solidify a particular provider from future changes or to force it to change across restarts. There was more discussion on this in the PR comments, if you're curious. Co-authored-by: Andrey Taranik <andrey@cicd.team> Co-authored-by: Oleg Vasilev <oleg@neon.tech>

Change in preparation to reduce the diff from adding virtio-mem.

In the original PR (#970), I'd intended to make this field mutable and added a comment about it, but forgot to actually remove it from the list of immutable fields. *Probably* doesn't matter much, but IMO it's worth cleaning up.

In the original PR (#970), I'd intended to make this field mutable and added a comment about it, but forgot to actually remove it from the list of immutable fields. Probably doesn't matter much, but IMO it's worth cleaning up.

sharnoff requested a review from Omrigan June 18, 2024 05:58

sharnoff mentioned this pull request Jun 18, 2024

temporary PR to help investigate ZONE_NORMAL #959

Closed

sharnoff mentioned this pull request Jun 18, 2024

neonvm: Make spec min/max/use resources required #971

Merged

Omrigan reviewed Jun 18, 2024

View reviewed changes

neonvm/apis/neonvm/v1/virtualmachine_webhook.go Outdated Show resolved Hide resolved

neonvm/apis/neonvm/v1/virtualmachine_types.go Show resolved Hide resolved

neonvm/controllers/vm_qmp_queries.go Show resolved Hide resolved

sharnoff mentioned this pull request Jun 19, 2024

neonvm-controller: Set podName only while Pending #973

Merged

sharnoff mentioned this pull request Jun 19, 2024

neonvm-controller: Emit CpuInfo/MemoryInfo every reconcile #974

Merged

sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from b58c8b5 to 8877ad8 Compare June 19, 2024 21:11

sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from 8877ad8 to ec73215 Compare June 19, 2024 21:21

sharnoff mentioned this pull request Jun 20, 2024

neonvm: Use memory_hotplug.online_policy=auto-movable for virtio-mem #981

Merged

sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from ec73215 to 072fe6a Compare June 20, 2024 03:24

sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from 072fe6a to 8e28c09 Compare June 20, 2024 16:59

Omrigan approved these changes Jun 21, 2024

View reviewed changes

sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch 2 times, most recently from f766bc5 to 5b8aead Compare June 22, 2024 01:13

sharnoff and others added 2 commits June 21, 2024 18:41

neonvm-controller: Use DIMM scaling into its own method (#970)

1f05547

Change in preparation to reduce the diff from adding virtio-mem.

sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from d2b8d89 to e2aab65 Compare June 22, 2024 01:45

sharnoff enabled auto-merge (rebase) June 22, 2024 01:46

sharnoff merged commit a1100e4 into main Jun 22, 2024
15 checks passed

sharnoff added a commit that referenced this pull request Jun 22, 2024

neonvm-controller: Use DIMM scaling into its own method (#970)

8954664

Change in preparation to reduce the diff from adding virtio-mem.

sharnoff deleted the sharnoff/neonvm-virtio-mem branch June 22, 2024 03:24

sharnoff mentioned this pull request Aug 7, 2024

neonvm-controller: Fix immutable .spec.guest.memoryProvider #1026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neonvm: Add support for virtio-mem #970

neonvm: Add support for virtio-mem #970

sharnoff commented Jun 18, 2024 •

edited

Loading

Omrigan left a comment

sharnoff commented Jun 19, 2024

sharnoff commented Jun 21, 2024

Omrigan left a comment

Omrigan Jun 21, 2024

sharnoff Jun 21, 2024

Omrigan Jun 21, 2024

sharnoff Jun 22, 2024

neonvm: Add support for virtio-mem #970

neonvm: Add support for virtio-mem #970

Conversation

sharnoff commented Jun 18, 2024 • edited Loading

Omrigan left a comment

Choose a reason for hiding this comment

sharnoff commented Jun 19, 2024

sharnoff commented Jun 21, 2024

Omrigan left a comment

Choose a reason for hiding this comment

Omrigan Jun 21, 2024

Choose a reason for hiding this comment

sharnoff Jun 21, 2024

Choose a reason for hiding this comment

Omrigan Jun 21, 2024

Choose a reason for hiding this comment

sharnoff Jun 22, 2024

Choose a reason for hiding this comment

sharnoff commented Jun 18, 2024 •

edited

Loading