Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract criteria for removing unneded nodes to a separate package #5147

Merged
merged 2 commits into from
Oct 17, 2022

Conversation

x13n
Copy link
Member

@x13n x13n commented Aug 30, 2022

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

One more refactor of existing scale down code in order to make parallel scale down (#5079) possible.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This is a follow-up to #5133. Only last 2 commits should be reviewed as a part of this PR. It will be on hold until the previous one is merged.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 30, 2022
@x13n
Copy link
Member Author

x13n commented Aug 30, 2022

/hold
/assign @MaciekPytel
/assign @yaroslava-serdiuk

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2022
@x13n x13n changed the title Scaledown4 Extract criteria for removing unneded nodes to a separate package Aug 30, 2022
@x13n x13n force-pushed the scaledown4 branch 2 times, most recently from 95de888 to d269e39 Compare September 1, 2022 13:21
@x13n x13n force-pushed the scaledown4 branch 2 times, most recently from 080e48c to b4925ea Compare September 16, 2022 10:55
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 16, 2022
}
toRemove := simulator.RemoveNodeFromTracker(sd.usageTracker, node, unneeded)
for _, n := range toRemove {
sd.unneededNodes.Drop(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm guessing the strange split of responsibilities (identifying nodes to filter out in one place and filtering them out elsewhere) is to avoid an import cycle between unneeded and simulator? A local interface in tracker.go that exposes AsList() and Drop() and renaming the function to simulator.FilterOutUsedNodes() or similar would IMO be a more elegant solution, but I'm not sure if it's worth the effort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly wanted to get rid of the weird semantic RemoveNodeFromTracker had: silently modifying the map passed as an argument. Now this is explicitly done here, which I think makes reasoning about the code easier. I wouldn't invest too much into getting this part pretty though, since the tracker itself should go away together with legacy scale down implementation.

cluster-autoscaler/core/scaledown/unneeded/nodes.go Outdated Show resolved Hide resolved
cluster-autoscaler/core/scaledown/unneeded/nodes.go Outdated Show resolved Hide resolved
// RemovableAt returns all nodes that can be removed at a given time, divided
// into empty and non-empty node lists, as well as a list of nodes that were
// unneeded, but are not removable, annotated by reason.
func (n *Nodes) RemovableAt(context *context.AutoscalingContext, ts time.Time, resourcesLeft resource.Limits, resourcesWithLimits []string, as scaledown.ActuationStatus) (empty, needDrain []*apiv1.Node, unremovable []*simulator.UnremovableNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure if this function should be responsible for dividing nodes into empty/non-empty:

  • The checks performed by Nodes() are all related to node itself and not pods running on it (which after your refactor is the domain of eligibility I think?). The fact that this package is grouping nodes by pods here seems inconsistent with what the rest of this package is doing and, even more so, outside of what I'd expect to be the responsibility of this function.
  • The grouping is done based on cached result passed to Update(). In current implementation that's fresh result, but that's a pretty big assumption. Obviously we want to avoid stale data leading to node being incorrectly considered empty.
  • Current implementation doesn't use that grouping anyway (it puts both groups into one list) and relies on re-checking this in getEmptyNodesToRemove(). Do we expect new scale-down implementation to act differently?

If we drop this groupby we should also update this to cache nodes and not ntbr, since as far as I can tell that's the only place where we use ntbr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually planned to avoid running the simulation twice in new scale down implementation, so the split done here would be final. We can get it wrong, and sometimes already do, which actually proves re-checking doesn't solve the problem, just reduces the chance of a race condition. The proper solution, I think, is to first apply the taints and only then look at the state of the cluster to determine which ones are empty. This was discussed and should be done in #3886. If you disagree we can discuss it there.

@x13n
Copy link
Member Author

x13n commented Sep 30, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2022
}

type node struct {
ntbr simulator.NodeToBeRemoved
Copy link
Contributor

@MaciekPytel MaciekPytel Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this some more, I have to say I don't really like the idea of caching this. We end up caching ntbr across CA loops, but I think the data it contains should not be considered valid outside of the lifetime of the ClusterSnapshot used in the simulation (ie. it's local to CA loop). The most obvious example is PodsToReschedule which can definitely change during the time in between loops. Even more importantly ClusterSnapshot is subject to various possible in-memory modifications (ex. FilterOutSchedulable or any custom shenanigans in PodListProcessor) that can be reflected in PodsToReschedule and the liftetime of those in-memory changes definitely should not be assumed to extend past a single loop.

I realize we expect to always call Update() and RemovableAt() in the same loop and the only thing we really cache between loops is 'since', so in practice it shouldn't break anything. But the fact that we rely on an unwritten assumption that we ignore cached ntbr outside the loop it was cached is not great.
TBH I don't think ntbr should be cached at all. It's never valid outside of a single simulation context, so to me it makes much more sense to just pass it around as a variable with a clearly limited lifetime.

I think this ties back to the problem I had before with RemovableAt grouping nodes into empty / non-empty. After discussing it with you, I agree it will work, but I just think it's completely unrelated to everything else this package is doing and I think limiting the package scope would be a better design choice. The fact that we're mixing responsibilities leads to secondary issues like the inconsistent cache lifetime I'm now complaining about.

nodeGroupSize := utils.GetNodeGroupSizeMap(context.CloudProvider)
for nodeName, v := range n.byName {
klog.V(2).Infof("%s was unneeded for %s", nodeName, ts.Sub(v.since).String())
node := v.ntbr.Node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the same comment as the one about cache above. I think this will work in this particular implementation, but I also think caching nodeName and getting a fresh Node from ClusterSnapshot is inherently a safer option than caching Nodes.

limitations under the License.
*/

package unneeded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a unittest for RemovableAt / unremovableReason would be a nice bonus. The logic in unremovableReason seems much more testable now. I realize this PR is just moving existing code around, so I'm not going to block on this, but if you decide to make more changes in this PR maybe that could be an extra target of opportunity.

@MaciekPytel
Copy link
Contributor

I think this will work and I'm going to block it anymore, but I don't really like this refactor for the reasons I listed in comments above. I'm leaving hold so you can have a chance to read my comments and decide if you want to address them or keep the current version. Please feel free to remove the hold.

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 14, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2022
@x13n
Copy link
Member Author

x13n commented Oct 17, 2022

I understand where the concern is coming from, but I disagree caching information is wrong in this case. Update / RemovableAt semantics are well defined: Update updates Nodes object with the last known state of the world, while RemovableAt returns the list of nodes that are removable at a certain timestamp, based on past Update calls. This is going to hold in any implementation that understands such semantics. I don't think Nodes object should be aware of autoscaler loops, pods or nodes being filtered out or anything that happens beyond this object. It's the other way around: if someone is trying to use Nodes anywhere in the code, they need to make sure they are using it properly. Omitting an Update call is not a problem of the object: it is a problem of the caller: if they didn't provide fresh information, the list of removable nodes may be inaccurate. Note that it will carries the risk of being inaccurate regardless of when the call happened: CA is making in-memory decisions based on snapshotted state of a distributed system. For its decisions to be accurate, we would have to freeze all changes in the system during the decision making, which is infeasible, for obvious reasons. That being said, I agree the code could be improved (it always can), but I'd like to merge it as is: it is a strict improvement over the previous state, it already enables further changes to implement parallel drain and it has been hanging for over 1.5 month now.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2022
@k8s-ci-robot k8s-ci-robot merged commit f445a6a into kubernetes:master Oct 17, 2022
navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022
Extract criteria for removing unneded nodes to a separate package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants