Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neonvm: Add support for virtio-mem #970

Merged
merged 2 commits into from
Jun 22, 2024
Merged

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Jun 18, 2024

Adapted from #705.

This implementation is focused on making virtio-mem a drop-in replacement for our existing DIMM hotplug -based memory scaling.

As such, the externally visible differences are:

  1. VirtualMachines have a new optional field: .spec.guest.memoryProvider
  2. VirtualMachines have a new optional field: .status.memoryProvider
  3. VirtualMachine .status.memorySize changes in smaller increments if using virtio-mem.

This change introduces the concept of a "memory provider", which is either "DIMMSlots" or "VirtioMem".

If a VM is created without setting the memory provider in the spec, the controller will use the default one, which is specified by its --default-memory-provider flag.
This will only persist for the lifetime of the runner pod. If the VM is restarted, then it may change to a new default.

Additionally, because our usage of virtio-mem has 8MiB block sizes, we actually don't default to virtio-mem if the VM was created with some odd memory slot size that isn't divisible by 8MiB.

The memory provider of the current runner pod is stored in the new status field. For old VMs created before this change, the status field is set to DIMMSlots iff the pod name is set without the status field.

Also, we allow mutating .spec.guest.memoryProvider IFF it is being set equal to the current .status.memoryProvider, in case we we want old VMs to keep their dimm slots after switching to virtio-mem by default.


Changes to split out from this PR:

  1. neonvm: Make spec min/max/use resources required #971
  2. neonvm-controller: Set podName only while Pending #973
  3. neonvm-controller: Emit CpuInfo/MemoryInfo every reconcile #974

On top of those, there's a couple places where code was indented that should probably be split into its own commit so it can be properly added to .git-blame-ignore-revs :)


Remaining testing for this PR before merging:

  • Test that backwards compatibility works as expected
    • Old VMs are updated while running as described above
    • Old VMs switch to the new default on restart
  • Test rollback safety: that it's ok to go between the last release and this PR (with --default-memory-provider=DIMMSlots).
    • Note: If --default-memory-provider=VirtioMem when this is released, we cannot make it rollback-safe.
  • Test that updating .spec.guest.memoryProvider works as expected

Future/follow-up work:

  1. neonvm: Use memory_hotplug.online_policy=auto-movable for virtio-mem #981
  2. Switch to virtio-mem by default
  3. Remove support for DIMM hotplug
  4. Move from representing memory as "memory slots" towards the implementation
  5. Fix the raciness issue with migration target pod creation.

@sharnoff sharnoff requested a review from Omrigan June 18, 2024 05:58
sharnoff added a commit that referenced this pull request Jun 18, 2024
This already was de-facto the case, which made much of the logic around
handling "but what if .max is nil?" somewhat hard to reason about.

Extracted from #970 and fleshed out here.

All our staging & production clusters have VMs meeting this criteria,
with the notable exception of a single, already-broken VM in
dev-us-east-2-beta.

More here: https://neondb.slack.com/archives/C03TN5G758R/p1718742308251569

---

Checked each cluster with:

kubectl get neonvm -o json \
  |  jq -r '.items[] | select(.spec.guest | [.cpus.max,.cpus.min,.memorySlots.max,.memorySlots.min] | map(select(. == null)) | (length != 0)) | .metadata.name'
Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple thoughts

sharnoff added a commit that referenced this pull request Jun 19, 2024
Fixes #967, extracted from #970.

In short, the state machine becomes simpler if we only set the VM's pod
name when it's in Pending. This gives us a couple things:

1. When the VM is Failed or Succeeded, we can un-set the pod name and it
   will be stable. This is what fixes #967
2. It gives us a consistent place to set status fields relating to a
   particular pod *before* creating the pod. This is why it's helpful
   for #970
sharnoff added a commit that referenced this pull request Jun 19, 2024
Extracted from #970.

Currently, when leaving the Scaling phase, we emit events like

  MemoryInfo  VirtualMachine vm-foo uses 4GiB memory
  CpuInfo     VirtualMachine vm-foo uses 1 cpu cores

To provide more up-to-date information in the event log, we can emit
these events every time .status.{cpus,memorySize} changes.

This is relevant for #970 because with virtio-mem, the current state of
the memory is performed asychronously and so otherwise wouldn't be
available in the logs.
sharnoff added a commit that referenced this pull request Jun 19, 2024
This already was de-facto the case, which made much of the logic around
handling "but what if .max is nil?" somewhat tricky to reason about.

Extracted from #970 and fleshed out here.

All our staging & production clusters have VMs meeting this criteria,
with the notable exception of a single, already-broken VM in
dev-us-east-2-beta.

More here: https://neondb.slack.com/archives/C03TN5G758R/p1718742308251569

---

Checked each cluster with:

kubectl get neonvm -o json \
  |  jq -r '.items[] | select(.spec.guest | [.cpus.max,.cpus.min,.memorySlots.max,.memorySlots.min] | map(select(. == null)) | (length != 0)) | .metadata.name'
sharnoff added a commit that referenced this pull request Jun 19, 2024
Fixes #967, extracted from #970.

In short, the state machine becomes simpler if we only set the VM's pod
name when it's in Pending. This gives us a couple things:

1. When the VM is Failed or Succeeded, we can un-set the pod name and it
   will be stable. This is what fixes #967
2. It gives us a consistent place to set status fields relating to a
   particular pod *before* creating the pod. This is why it's helpful
   for #970
sharnoff added a commit that referenced this pull request Jun 19, 2024
Extracted from #970.

Currently, when leaving the Scaling phase, we emit events like

  MemoryInfo  VirtualMachine vm-foo uses 4GiB memory
  CpuInfo     VirtualMachine vm-foo uses 1 cpu cores

To provide more up-to-date information in the event log, we can emit
these events every time .status.{cpus,memorySize} changes.

This is relevant for #970 because with virtio-mem, the current state of
the memory is performed asychronously and so otherwise wouldn't be
available in the logs.
@sharnoff sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from b58c8b5 to 8877ad8 Compare June 19, 2024 21:11
@sharnoff
Copy link
Member Author

Cleaned up the commits in this PR -- added single commits for #973 and #974, and split some indenting into a separate commit. May move more indenting, need to see.

Anyways, hopefully should make the final commit easier to review 🤞

@sharnoff sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from 8877ad8 to ec73215 Compare June 19, 2024 21:21
@sharnoff sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from ec73215 to 072fe6a Compare June 20, 2024 03:24
sharnoff added a commit that referenced this pull request Jun 20, 2024
Fixes #967, extracted from #970.

In short, the state machine becomes simpler if we only set the VM's pod
name when it's in Pending. This gives us a couple things:

1. When the VM is Failed or Succeeded, we can un-set the pod name and it
   will be stable. This is what fixes #967
2. It gives us a consistent place to set status fields relating to a
   particular pod *before* creating the pod. This is why it's helpful
   for #970
sharnoff added a commit that referenced this pull request Jun 20, 2024
Extracted from #970.

Currently, when leaving the Scaling phase, we emit events like

  MemoryInfo  VirtualMachine vm-foo uses 4GiB memory
  CpuInfo     VirtualMachine vm-foo uses 1 cpu cores

To provide more up-to-date information in the event log, we can emit
these events every time .status.{cpus,memorySize} changes.

This is relevant for #970 because with virtio-mem, the current state of
the memory is performed asychronously and so otherwise wouldn't be
available in the logs.
@sharnoff sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from 072fe6a to 8e28c09 Compare June 20, 2024 16:59
@sharnoff
Copy link
Member Author

Did backwards-compatibility testing locally (started a VM, upgraded the version, checked status was as expected, assuming that our existing e2e tests can take it from there). Found & fixed a bug.
Also checked rollback safety; yeah, seems fine :)
One interesting thing, posted it here: https://neondb.slack.com/archives/C039YKBRZB4/p1718950172991649

Copy link
Contributor

@Omrigan Omrigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

// Maybe we're already using the amount we want?
// Update the status to reflect the current size - and if it matches goalTotalSize, ram
// is done.
currentTotalSize, err := QmpGetMemorySize(QmpAddr(vm))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a redundant check, we could get the current memory size from where it is set.

But no worries, I'll clean it up in #943

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in testing I actually saw this value change in between QMP queries in the same reconcile cycle. Not totally sure though...

if p := vm.Spec.Guest.MemoryProvider; p != nil {
return *p
}
if p := vm.Status.MemoryProvider; p != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this prevent us from changing from one provider to the other by deleting the pod, because the old value be preserved through Status?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tl;dr on that is that we reset .status.memoryProvider in the same place we reset .status.podName (there's some method on VirtualMachineStatus that does it for us).

I tested it locally and it works as expected :)

sharnoff added a commit that referenced this pull request Jun 22, 2024
Extracted from #970.

Currently, when leaving the Scaling phase, we emit events like

  MemoryInfo  VirtualMachine vm-foo uses 4GiB memory
  CpuInfo     VirtualMachine vm-foo uses 1 cpu cores

To provide more up-to-date information in the event log, we can emit
these events every time .status.{cpus,memorySize} changes.

This is relevant for #970 because with virtio-mem, the current state of
the memory is performed asychronously and so otherwise wouldn't be
available in the logs.
@sharnoff sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch 2 times, most recently from f766bc5 to 5b8aead Compare June 22, 2024 01:13
sharnoff and others added 2 commits June 21, 2024 18:41
Change in preparation to reduce the diff from adding virtio-mem.
Adapted from #705.

This implementation is focused on making virtio-mem a drop-in
replacement for our existing DIMM hotplug -based memory scaling.

As such, the externally visible differences are:

1. VirtualMachines have a new optional field: .spec.guest.memoryProvider
2. VirtualMachines have a new optional field: .status.memoryProvider
3. VirtualMachine .status.memorySize changes in smaller increments if
   using virtio-mem.

This change introduces the concept of a "memory provider", which is
either "DIMMSlots" or "VirtioMem".

If a VM is created without setting the memory provider in the spec, the
controller will use the default one, which is specified by its
--default-memory-provider flag.
This will only persist for the lifetime of the runner pod. If the VM is
restarted, then it may change to a new default.

Additionally, because our usage of virtio-mem has 8MiB block sizes, we
actually *don't* default to virtio-mem if the VM was created with some
odd memory slot size that isn't divisible by 8MiB.

The memory provider of the current runner pod is stored in the new
status field. For old VMs created before this change, the status field
is set to DIMMSlots iff the pod name is set without the status field.

Also, we allow mutating .spec.guest.memoryProvider to allow the
flexibility to solidify a particular provider from future changes or to
force it to change across restarts. There was more discussion on this in
the PR comments, if you're curious.

Co-authored-by: Andrey Taranik <andrey@cicd.team>
Co-authored-by: Oleg Vasilev <oleg@neon.tech>
@sharnoff sharnoff force-pushed the sharnoff/neonvm-virtio-mem branch from d2b8d89 to e2aab65 Compare June 22, 2024 01:45
@sharnoff sharnoff enabled auto-merge (rebase) June 22, 2024 01:46
@sharnoff sharnoff merged commit a1100e4 into main Jun 22, 2024
15 checks passed
sharnoff added a commit that referenced this pull request Jun 22, 2024
Change in preparation to reduce the diff from adding virtio-mem.
@sharnoff sharnoff deleted the sharnoff/neonvm-virtio-mem branch June 22, 2024 03:24
sharnoff added a commit that referenced this pull request Aug 7, 2024
In the original PR (#970), I'd intended to make this field mutable and
added a comment about it, but forgot to actually remove it from the list
of immutable fields.

*Probably* doesn't matter much, but IMO it's worth cleaning up.
sharnoff added a commit that referenced this pull request Aug 9, 2024
In the original PR (#970), I'd intended to make this field mutable and
added a comment about it, but forgot to actually remove it from the list
of immutable fields.

Probably doesn't matter much, but IMO it's worth cleaning up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants