Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachineSet does not consider current state of Cluster before creating new machines #8657

Closed
ykakarap opened this issue May 14, 2023 · 4 comments · Fixed by #8595
Closed

MachineSet does not consider current state of Cluster before creating new machines #8657

ykakarap opened this issue May 14, 2023 · 4 comments · Fixed by #8595
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ykakarap
Copy link
Contributor

Detailed Description:

When a MachineSet creates machines under certain circumstances, the operation fails or leads to a new machine that will be deleted and recreated in a short timeframe, and this is not ideal.

This issue is about introducing preflight checks (details below) in the MachineSet controller so the machine creation is blocked until circumstances are known to be safe for machine creation, ultimately ensuring stability of workloads in the cluster.

Following cases can be handled more safely by implementing preflight checks:

  • When the Kubernetes minor version of the new Machine is greater than control plane minor version, new machine creation might fail (violates Kubernetes version skew)
  • When using the KubeadmBootstrapProvider and the Kubernetes minor version of the new Machine is not equal to the control plane minor version, new machine creation might fail (violates the kubeadm version skew)
  • When the control plane is upgrading, similar problem as above might happen + other errors might happen if your load balancer is not fast enough in updating its configuration to CP machines being deleted

Like for kubeadm preflight checks, we are considering to have also options to skip preflight checks (probably an annotation, TBD). The main use case for skipping preflight check is to allow users to run a combination of versions outside the kubeadm policy, given that in some cases this works, but it also seems a nice option given the level of configurability we are asked for.

Anything else you would like to add?
Follow-ups:

  • Surface preflight check failures in a condition that can be bubbled up to the MachineDeployment.

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2023
@ykakarap ykakarap changed the title MachineSet should consider current state of Cluster before creating new machines MachineSet does not consider current state of Cluster before creating new machines May 14, 2023
@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 15, 2023
@ykakarap
Copy link
Contributor Author

/assign

@dlipovetsky
Copy link
Contributor

When using the KubeadmBootstrapProvider and the Kubernetes minor version of the new Machine is not equal to the control plane minor version, new machine creation might fail (violates the kubeadm version skew)

We talked about kubeadm version skew at the Cluster Lifecycle SIG meeting today.

/cc @pacoxu

@dlipovetsky
Copy link
Contributor

(Just for reference, a couple of related issues: #6040, which was closed in favor of #7011)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants