Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster-wide baseline CPU definition for virtual machines #486

Closed
stgraber opened this issue Feb 11, 2024 · 4 comments · Fixed by #981
Closed

Cluster-wide baseline CPU definition for virtual machines #486

stgraber opened this issue Feb 11, 2024 · 4 comments · Fixed by #981
Assignees
Labels
Feature New feature, not a bug
Milestone

Comments

@stgraber
Copy link
Member

Currently our live-migration logic assumes that all servers are the same and that instances can be migrated to any other server within the cluster so long as they are of the same CPU architecture.

That's obviously not correct as variation in CPU features will cause live-migration to fail.

To resolve this, we should do two things:

  • Add a function that will check if source and destination server share the same CPU. That should then be added to our current migration and evacuation logic to only consider target servers that match the source.
  • As an alternative, add an option to generate a baseline CPU based on supported CPU features across the cluster for a given architecture. This baseline will then be used as the CPU definition for any instance that's set to be migratable (migration.stateful=true).
@stgraber stgraber added the Feature New feature, not a bug label Feb 11, 2024
@stgraber stgraber added this to the soon milestone Mar 8, 2024
@christina-zh
Copy link

Im interested in working on this issue, can I be assigned to it please?

@stgraber
Copy link
Member Author

This one we'll do in two stages as I'm not entirely sure about how I want to go around the second stage yet :)

For the first stage, what we need is expose the CPU flags/extensions in our resources API as that will be needed to actually compare all servers and see what flags/extensions they all have in common (within one CPU architecture).

So for stage one, you'll want to:

  • Add a new API extension, let's go with resources_cpu_flags in internal/version/api.go and doc/api-extensions.md
  • Add a new Flags []string to ResourceCPUCore in shared/api/resource.go
  • Re-generate the API metadata (make update-api)
  • Extend the /proc/cpuinfo parsing logic in internal/server/resources/cpu.go to fill in the new Flags field

That should result in the following commits:

  • api: resources_cpu_flags
  • shared/api: Add Flags to ResourceCPUCore
  • doc/rest-api: Refresh swagger YAML
  • incusd/resources: Add CPU Flags to ResourceCPUCore

This one is pretty easy to test at least, once you're running an updated incusd, you can run incus query /1.0/resources to look at the whole resource dump and check that your CPU flags match what you see in cat /proc/cpuinfo

@milaiwi
Copy link
Contributor

milaiwi commented May 3, 2024

Is this still being worked on? If not, I'd love to take it!

@christina-zh
Copy link

Yes we are still working on it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Development

Successfully merging a pull request may close this issue.

3 participants