Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

refactor: upgrade code refactor part 1 #287

Merged

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Jan 11, 2019

Reason for Change:

This is an initial refactor of the upgrade code. More to follow.

Issue Fixed:

Requirements:

Notes:
TODO:

  • Add unit tests
  • Run upgrade back compat tests

@CecileRobertMichon CecileRobertMichon changed the title refactor: upgrade code refactor part 1 [WIP] refactor: upgrade code refactor part 1 Jan 11, 2019
// getClusterNodeVersion returns a node's "orchestrator:version" via Kubernetes API or VM tag.
func (uc *UpgradeCluster) getClusterNodeVersion(client armhelpers.KubernetesClient, name string, tags map[string]*string) string {
// getNodeVersion returns a node's current Kubernetes version via Kubernetes API or VM tag.
func (uc *UpgradeCluster) getNodeVersion(client armhelpers.KubernetesClient, name string, tags map[string]*string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reverse the precedence order here? Shouldn't we prefer to check the k8s API first, and then fall back to the tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same behavior that was implemented in @mboersma's PR to stop requiring tags Azure/acs-engine#4100 so I'll defer to him for choosing this approach over the other.

I think the idea was that we maintain existing behavior to limit risk and only use the alternative method if we can't find the tags.

If we do decide we want to reverse the order, I would argue that's a behavior change and it doesn't belong in this refactor PR (which is already dangerous enough as is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Cécile's comment here. I think it's safer to leave it alone.

This way the existing behavior is still the default (use tags), and the k8s API is explicitly a fallback strategy. At the time, that was deemed to be less disruptive, but it has proved to be useful in failed upgrade cases: setting the tags is a safe way to "trick" the upgrade behavior to run again.

@jackfrancis
Copy link
Member

Is this an appropriate PR to include consideration of the following changes:

  • implement "upgrade single node" as a composable unit
  • expose above as an option for a user

Use cases:

  • "I want to upgrade a single node and schedule some stuff onto it, do a/b testing with non-upgraded nodes and compare"
  • "My upgrade failed halfway through and I want to cleanup my cluster by manually upgrading the remaining nodes that are running an old version"

@CecileRobertMichon
Copy link
Contributor Author

@jackfrancis short answer is no.

This is a refactor PR and the first of many. I tried to break down the changes so that they are easier to review/test. Adding new features is out of the scope of this PR.

We can definitely keep that in the backlog as a future addition though.

@CecileRobertMichon
Copy link
Contributor Author

To clarify for reviewers: there are many more things I want to change/improve in the upgrade code area, this is only the beginning. I am breaking down the changes in smaller units of code changes to make it more manageable to review and limit risk of breakage since upgrade is a complicated/sensitive part of the code. Trying to change that, one step at a time :)

uc.agentPoolsToUpgrade = []string{}
log.Infoln(fmt.Sprintf("Gathering agent pool names..."))
uc.agentPoolsToUpgrade = make(map[string]bool)
uc.agentPoolsToUpgrade[kubernetesupgrade.MasterPoolName] = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this additive? I don't see us adding the master in the previous agentPoolsToUpgrade string slice pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I should rename this to poolsToUpgrade - the agentPoolsToUpgrade name doesn't really make sense

for _, poolName := range agentPoolsToUpgrade {
uc.AgentPoolsToUpgrade[poolName] = true
}
uc.AgentPoolsToUpgrade[MasterPoolName] = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see now what you did above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I just copied existing behavior but I don't really understand what AgentPoolsToUpgrade is used for yet

@CecileRobertMichon CecileRobertMichon changed the title [WIP] refactor: upgrade code refactor part 1 refactor: upgrade code refactor part 1 Jan 14, 2019
@CecileRobertMichon
Copy link
Contributor Author

CecileRobertMichon commented Jan 14, 2019

Scenarios tested so far:

✅ Windows 1.8 cluster with VMAS
✅ Linux 1.10 cluster with VMAS
🔴 Linux 1.12 cluster with VMSS agents (VMSS master upgrade is non-functional in current master)
✅ Linux 1.7 cluster with VMAS

if client != nil {
node, err := client.GetNode(name)
if err == nil {
return api.Kubernetes + ":" + strings.TrimPrefix(node.Status.NodeInfo.KubeletVersion, "v")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we return just the k8s version instead of kubernetes:v1.x.x since this is in the kubernetesupgrade package, we only care about the version - not the orchestrator.

return uc.Translator.Errorf("Error while querying ARM for resources: %+v", err)
}

var upgrader UpgradeWorkFlow
upgradeVersion := uc.DataModel.Properties.OrchestratorProfile.OrchestratorVersion
uc.Logger.Infof("Upgrading to Kubernetes version %s\n", upgradeVersion)
switch {
case strings.HasPrefix(upgradeVersion, "1.6."):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if anyone knows a good reason why these separate upgraders are still needed, please speak up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 as far as I can tell, these structs provide no utility over the Upgrader base.

Maybe someone was anticipating we would need different behavior for each version upgrade? (Don't do that in advance because YAGNI.)

@CecileRobertMichon
Copy link
Contributor Author

VMSS upgrade is failing with Error upgrading cluster: Drain did not complete within 1m0s

#298 might help

@codecov
Copy link

codecov bot commented Jan 18, 2019

Codecov Report

Merging #287 into master will increase coverage by 0.02%.
The diff coverage is 43.75%.

@@            Coverage Diff            @@
##           master    #287      +/-   ##
=========================================
+ Coverage   53.17%   53.2%   +0.02%     
=========================================
  Files          95      95              
  Lines       14247   14216      -31     
=========================================
- Hits         7576    7563      -13     
+ Misses       6006    5988      -18     
  Partials      665     665

@@ -204,41 +164,37 @@ func (uc *UpgradeCluster) getClusterNodeStatus(subscriptionID uuid.UUID, az armh
}

for _, vm := range vmListPage.Values() {
vmOrchestratorTypeAndVersion := uc.getClusterNodeVersion(kubeClient, *vm.Name, vm.Tags)
if vmOrchestratorTypeAndVersion == "" {
// Windows VMs contain a substring of the name suffix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the Windows-specific implications of the line(s) below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.Contains(*(vm.Name), uc.NameSuffix[:4]+"k8s" is checking for VM naming specific to windows instead of looking for the whole name suffix

@@ -394,28 +345,3 @@ func (uc *UpgradeCluster) addVMToAgentPool(vm compute.VirtualMachine, isUpgradab

return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually kinda think we should keep this in, uncomment it, and standardize to persisting the template that we deploy to upgrade always.

But perhaps for a later exercise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have logic for saving the modified template in upgrade.go https://github.com/CecileRobertMichon/aks-engine/blob/0b7a9702300ed5d033874a6ced2d2a140867fe18/cmd/upgrade.go#L281 so this commented out code is redundant.

I think we need to improve the logic for persisting templates and instead of passing them to a file, saving them in the RG or something like that but, like you said, for a later exercise.

@jackfrancis
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Jan 22, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CecileRobertMichon,jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

refactor: some more upgradeCluster() refactoring

refactor: getClusterNodeStatus

fix: s/nodes/VMs

fix: s/subset/substring

fix: ClusterTopology is in kubernetesupgrade package

refactor: simplify upgradable() func

refactor: move upgradecluster to instance literal

Revert "refactor: move upgradecluster to instance literal"

This reverts commit ac6aacd.

refactor: fix upgradeCluster parameters in unit tests

fix import

remove import cycle

fix unit test error string
@acs-bot
Copy link

acs-bot commented Jan 22, 2019

New changes are detected. LGTM label has been removed.

@CecileRobertMichon CecileRobertMichon merged commit 065ccd3 into Azure:master Jan 23, 2019
juhacket pushed a commit to juhacket/aks-engine that referenced this pull request Mar 14, 2019
@CecileRobertMichon CecileRobertMichon deleted the upgrade-refactor branch April 18, 2019 22:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants