-
Notifications
You must be signed in to change notification settings - Fork 519
docs: upgrade + cluster-autoscaler notes #381
docs: upgrade + cluster-autoscaler notes #381
Conversation
@feiskyer Can you kindly read through my initial description of upgrade + VMSS + cluster-autoscaler and comment if it's an accurate summary of the current situation? Thanks! |
docs/topics/upgrade.md
Outdated
|
||
### Cluster-autoscaler + VMSS | ||
|
||
At present, the Azure cloudprovider cluster-autoscaler implementation for VMSS relies upon the original ARM template deployment specification to inform the Azure IaaS (VM, NIC, CustomScriptExtension, etc) configuration to scale out new nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aks-engine upgrade
would also update the model of vmss itself, right? If so, then new nodes scaled should also be applied with this new model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, this is not the case. The kubectl get nodes
output I pasted below is from a vmss upgrade then cluster-autoscaler event. So apparently the answer is "no, aks-engine upgrade
does not update the model of the vmss itself".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then scale up for VMSS would still use the original old model. Do you have any ideas to make CA also works for upgraded clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.K., so I have evidence that what I said is not true. Perhaps I was testing a VMAS agent pool. I will upgrade my VMSS test cluster a few more times and validate that cluster-autoscaler continues to respect the versions as they move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm wondering that only true if the model of VMSS itself is also upgraded.
docs/topics/upgrade.md
Outdated
|
||
### Cluster-autoscaler + VMAS | ||
|
||
A similar scenario exists for VMAS as well, but because the cluster-autoscaler spec includes a configurable ARM template deployment reference, you may manually maintain that reference over time to be current with the ARM template deployment that `aks-engine upgrade` creates during an upgrade operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add another paragraph of how to get the upgraded ARM templates and parameters?
|
||
The upgrade operation is long running, and for large clusters, more susceptible to single operational failures. This is based on the design principle of upgrade enumerating, one-at-a-time, through each node in the cluster. A transient Azure resource allocation error could thus interrupt the successful progression of the overall transaction. At present, the upgrade operation is implemented to "fail fast"; and so, if a well formed upgrade operation fails before completing, it can be manually retried by invoking the exact same command line arguments as were sent originally. The upgrade operation will enumerate through the cluster nodes, skipping any nodes that have already been upgraded to the desired Kubernetes version. Those nodes that match the *original* Kubernetes version will then, one-at-a-time, be cordon and drained, and upgraded to the desired version. Put another way, an upgrade command is designed to be idempotent across retry scenarios. | ||
|
||
### Cluster-autoscaler + VMSS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a link/reference to this doc in the cluster-autoscaler doc examples/addons/cluster-autoscaler/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we can add one after this is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add it in the same PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I mean add a link in cluster-autoscaler repo back to here.
And yep, in examples, it could be added in the same PR.
@CecileRobertMichon this is ready for a re-review |
Codecov Report
@@ Coverage Diff @@
## master #381 +/- ##
=======================================
Coverage 53.42% 53.42%
=======================================
Files 95 95
Lines 14361 14361
=======================================
Hits 7673 7673
Misses 6025 6025
Partials 663 663 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CecileRobertMichon, jackfrancis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Reason for Change:
Document current issues and limitations of
aks-engine upgrade
andcluster-autoscaler
Issue Fixed:
Requirements:
Notes: