Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for all-or-nothing scale-up strategy #6821

Merged

Conversation

aleksandra-malinowska
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implements all-or-nothing scaling strategy. This is needed for supporting atomic-scale-up.kubernetes.io ProvisioningRequests: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/provisioning-request.md

Issue: #6815

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/cc @yaroslava-serdiuk @towca

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2024
Copy link
Contributor

@yaroslava-serdiuk yaroslava-serdiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some test cases?

return stopAllOrNothingScaleUp(podEquivalenceGroups, skippedNodeGroups, nodeGroups)
}
}

// Execute scale up.
klog.V(1).Infof("Final scale-up plan: %v", scaleUpInfos)
aErr, failedNodeGroups := o.scaleUpExecutor.ExecuteScaleUps(scaleUpInfos, nodeInfos, now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add allOrNothing variable here, and use AtomicIncreaseSize() method if allOrNothing=true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -79,9 +80,9 @@ func (o *WrapperOrchestrator) ScaleUp(
}

if o.scaleUpRegularPods {
return o.podsOrchestrator.ScaleUp(regularPods, nodes, daemonSets, nodeInfos)
return o.podsOrchestrator.ScaleUp(regularPods, nodes, daemonSets, nodeInfos, allOrNothing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for podsOrchestrator allOrNothing=false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is false and set in static_autoscaler.go, I'm just passing the variable here.

I wanted this to be set to whatever is the default for the autoscaler, and the custom orchestrators to override it only as needed (like atomic scale-up for ProvisioningRequests will).

@yaroslava-serdiuk
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 14, 2024
Comment on lines 227 to 230
if allOrNothing && bestOption.NodeGroup.MaxSize() < newNodes {
klog.V(1).Infof("Can only create a new node group with max %d nodes, need %d nodes", bestOption.NodeGroup.MaxSize(), newNodes)
return stopAllOrNothingScaleUp(podEquivalenceGroups, skippedNodeGroups, nodeGroups)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Will it not be checked in 155-156 and ComputeExpansionOption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In 155-156 we only exclude cases when there's at least one pod that can't be scheduled on nodes belonging to this group at all. If all pods are compatible with that node group, we'll just get a higher node count (possibly higher than group max size - but we can still balance across similar groups later).

  2. In computeExpansionOption we only cap the node count for special case (groups that scale from 0 to max).

We don't check group max size for all cases in either of those places because of balancing across similar groups, which we only do just before executing scale-up.

So without this check, for a scale-up not affected by the above two cases we could create a node group, then decide not to scale it up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think MaxSize() is checked before balancing node groups below, because the amount over max can be distributed to the similar node groups.

But why check newNodes against the max size of just one group, if we're balancing between groups below (and verifying that the total capacity is big enough) anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why check newNodes against the max size of just one group, if we're balancing between groups below (and verifying that the total capacity is big enough) anyway?

This is checked only for autoprovisioned group just before it's created. I think we currently don't have information how many similar groups there'll be. I suspect estimating this may require changes on cloud provider side (for example, if creating a group in one zone implicitly creates groups in another zone with identical taint/labels/resources).

We agreed with @kisieland offline that it's acceptable to limit it for autoprovisioning to max size of one group initially and leave an open issue for handling this better for cases where an autoprovisioned groups will have unobvious similar groups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the final total capacity check would still catch this (because the balancing processor also caps all the requests to MaxSize()), but this lets us avoid an unnecessary node group creation in that case?

In any case, this seems okay. As a nit, maybe leave a TODO for the issue when you have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the final total capacity check would still catch this (because the balancing processor also caps all the requests to MaxSize()), but this lets us avoid an unnecessary node group creation in that case?

Yes, that's exactly the case.

As a nit, maybe leave a TODO for the issue when you have it.

Will do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some test cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my new test cases are in the other PR: #6824

I can add some here as well to have more granularity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test case which fails if allOrNothing is false. More extensive test cases remain in #6824

Comment on lines 155 to 163
var err error
if allOrNothing {
err = info.Group.AtomicIncreaseSize(increase)
} else {
err = info.Group.IncreaseSize(increase)
}
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should fallback to IncreaseSize(increase) if AtomicIncreaseSize(increase) is not implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 14, 2024
cluster-autoscaler/core/scaleup/orchestrator/executor.go Outdated Show resolved Hide resolved
cluster-autoscaler/core/scaleup/orchestrator/executor.go Outdated Show resolved Hide resolved
Comment on lines 227 to 230
if allOrNothing && bestOption.NodeGroup.MaxSize() < newNodes {
klog.V(1).Infof("Can only create a new node group with max %d nodes, need %d nodes", bestOption.NodeGroup.MaxSize(), newNodes)
return stopAllOrNothingScaleUp(podEquivalenceGroups, skippedNodeGroups, nodeGroups)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think MaxSize() is checked before balancing node groups below, because the amount over max can be distributed to the similar node groups.

But why check newNodes against the max size of just one group, if we're balancing between groups below (and verifying that the total capacity is big enough) anyway?

@@ -89,6 +89,7 @@ func (o *ScaleUpOrchestrator) ScaleUp(
nodes []*apiv1.Node,
daemonSets []*appsv1.DaemonSet,
nodeInfos map[string]*schedulerframework.NodeInfo,
allOrNothing bool, // Either request enough capacity for all unschedulablePods, or don't request it at all.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you also copy the comment to the Orchestrator interface (e.g. in the follow-up PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in follow-up PR

// Can't execute a scale-up that will accommodate all pods, so nothing is considered schedulable.
klog.V(1).Info("Not attempting scale-up due to all-or-nothing strategy: not all pods would be accommodated")
markedEquivalenceGroups := markAllGroupsAsUnschedulable(podEquivalenceGroups, AllOrNothingReason)
return buildNoOptionsAvailableStatus(markedEquivalenceGroups, skippedNodeGroups, nodeGroups), nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the helper function for the 2 other ScaleUpNoOptionsAvailable returns from this method?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be fixed in the follow-up PR as well if you prefer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@towca
Copy link
Collaborator

towca commented May 15, 2024

Left 3 nits, otherwise LGTM! Feel free to unhold if you don't agree with the nits or if you prefer to address them in the follow-up PR.

/lgtm
/approve
/label tide/merge-method-squash
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. labels May 15, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aleksandra-malinowska, towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2024
@aleksandra-malinowska
Copy link
Contributor Author

Stuff left to address in follow-up PR:

  • comment explaining allOrNothing sematics
  • use the helper function for ScaleUpNoOptionAvailable status everywhere
  • add TODO with link to the issue for improving condition controlling if the group should be created

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit e795ac9 into kubernetes:master May 16, 2024
6 checks passed
kwiesmueller pushed a commit to kwiesmueller/k8s-autoscaler that referenced this pull request May 16, 2024
* Add support for all-or-nothing scale-up strategy

* Review fixes
@yaroslava-serdiuk
Copy link
Contributor

/cherry-pick cluster-autoscaler-release-1.30

@k8s-infra-cherrypick-robot

@yaroslava-serdiuk: #6821 failed to apply on top of branch "cluster-autoscaler-release-1.30":

Applying: Add support for all-or-nothing scale-up strategy
Using index info to reconstruct a base tree...
M	cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go
M	cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go
M	cluster-autoscaler/provisioningrequest/orchestrator/orchestrator_test.go
Falling back to patching base and 3-way merge...
Auto-merging cluster-autoscaler/provisioningrequest/orchestrator/orchestrator_test.go
Auto-merging cluster-autoscaler/core/scaleup/orchestrator/orchestrator_test.go
Auto-merging cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go
CONFLICT (content): Merge conflict in cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Add support for all-or-nothing scale-up strategy
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick cluster-autoscaler-release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@yaroslava-serdiuk
Copy link
Contributor

/cherry-pick cluster-autoscaler-release-1.30

@k8s-infra-cherrypick-robot

@yaroslava-serdiuk: new pull request created: #7015

In response to this:

/cherry-pick cluster-autoscaler-release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants