-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state #2972
Comments
Looking a bit more into this using Tilt and Extra logging i can see that after the first node is replaced with a surge the controller
Following the code
Is this what we expect ? should we actually look at more to determine if the machine is the latestmodel ? ( Hopefully we can look at what the Azure UI is reporting ) |
I closed the PR i opened because it does conflict with the Pinging @mweibel for his work into the bootstrap token issue. ContextI will summarize here what i understand the issue to be, then i will bring this up in slack for feedback / discussion
This will cause the nodes in the VMSS to have inconsistent configurations between them and, the more changes are applied over time the more mixed the set of VMs in the VMSS will be. The Problemthe PR #2975 I proposed does sync the Until the Further ComplicationWith the proposed fix for the Questions
|
thanks for looking into this - I'm glad not being the only one to experience it 🙂 Your summary is interesting and clarified a few things for me. Thanks! The change in #2975 does make sense from my point of view. In our case we often see out of date instances at the moment. Using the data from the API as the indicator makes sense to me because otherwise we might always see inconsistent state in the portal, for example. To continue with #2803 I wanted to check/verify how e.g. CAPA handles this. If the issue with not updating bootstrap tokens is consistent across providers, we might want a solution (or at least a documentation on how to handle it) on the main CAPI project. I didn't investigate more into this due to time constraints (this didn't change yet unfortunately). If anyone is willing to take this over that'd be great! Questions
|
I do not know , I will try and do a bit of reasearch.
I might need to double check this . I came to this conclusion because when i do a
the instances are Looking at the data for but looking at the actual output of
I see a weird difference which makes no sense to me
I was planning to give your branch a try , i saw you mentioned in your draft pr that is not working. Mind adding a bit more notes about what's not working there ? I will also use it to confirm if changing of CustomData does affect the latestmodel or not , either way we still need that fix before implementing this one |
For the moment internally we decided to move back to I am planning to get back to it as soon as it becomes relevant for us again |
Unsure, I think that would need to be checked again. I plan on verifying the join issues are gone in the next couple of days. However our use case doesn't usually involve adjusting vmSize for existing MachinePools - we usually create new ones when we iterate on the specs so I'm not sure I can verify if this issue is fixed. |
Same here. This issue, at least its original version when i noticed it, would not be solved by the fix of the fix of the the logic i looked at when opened this issue is here #2972 (comment)
The same problem would happen on change of I am currently not looking at this issue since we reverted to MachineDeployments since then , but what i found at the time was that
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/kind bug
What steps did you take and what happened:
vmSize
( fromStandard_D2s_v5
toStandard_D4s_v5
)ScaleSetOufOfDate
state and one node gets replacedWhat did you expect to happen:
All nodes to be replaced using the same process that was used for the first node
Anything else you would like to add:
The docs suggests that when changing
vmSize
orkubernetesVersion
a full refresh of nodes in the nodepool should be performedMachine pool manifests ( pre-change )
Before changing the instance type
After Changing the instance type
Everything is reported stable and ready, nothing is
modelOutOfDate
logs only show the standard reconciling regular messages
Environment:
kubectl version
): 1.24.8/etc/os-release
):The text was updated successfully, but these errors were encountered: