Extract criteria for removing unneded nodes to a separate package #5147

x13n · 2022-08-30T09:53:56Z

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

One more refactor of existing scale down code in order to make parallel scale down (#5079) possible.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This is a follow-up to #5133. Only last 2 commits should be reviewed as a part of this PR. It will be on hold until the previous one is merged.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

x13n · 2022-08-30T09:55:07Z

/hold
/assign @MaciekPytel
/assign @yaroslava-serdiuk

MaciekPytel · 2022-09-23T16:27:42Z

cluster-autoscaler/core/scaledown/legacy/legacy.go

+	}
+	toRemove := simulator.RemoveNodeFromTracker(sd.usageTracker, node, unneeded)
+	for _, n := range toRemove {
+		sd.unneededNodes.Drop(n)


nit: I'm guessing the strange split of responsibilities (identifying nodes to filter out in one place and filtering them out elsewhere) is to avoid an import cycle between unneeded and simulator? A local interface in tracker.go that exposes AsList() and Drop() and renaming the function to simulator.FilterOutUsedNodes() or similar would IMO be a more elegant solution, but I'm not sure if it's worth the effort.

I mainly wanted to get rid of the weird semantic RemoveNodeFromTracker had: silently modifying the map passed as an argument. Now this is explicitly done here, which I think makes reasoning about the code easier. I wouldn't invest too much into getting this part pretty though, since the tracker itself should go away together with legacy scale down implementation.

cluster-autoscaler/core/scaledown/unneeded/nodes.go

MaciekPytel · 2022-09-23T20:27:06Z

cluster-autoscaler/core/scaledown/unneeded/nodes.go

+// RemovableAt returns all nodes that can be removed at a given time, divided
+// into empty and non-empty node lists, as well as a list of nodes that were
+// unneeded, but are not removable, annotated by reason.
+func (n *Nodes) RemovableAt(context *context.AutoscalingContext, ts time.Time, resourcesLeft resource.Limits, resourcesWithLimits []string, as scaledown.ActuationStatus) (empty, needDrain []*apiv1.Node, unremovable []*simulator.UnremovableNode) {


I'm not really sure if this function should be responsible for dividing nodes into empty/non-empty:

The checks performed by Nodes() are all related to node itself and not pods running on it (which after your refactor is the domain of eligibility I think?). The fact that this package is grouping nodes by pods here seems inconsistent with what the rest of this package is doing and, even more so, outside of what I'd expect to be the responsibility of this function.

The grouping is done based on cached result passed to Update(). In current implementation that's fresh result, but that's a pretty big assumption. Obviously we want to avoid stale data leading to node being incorrectly considered empty.

Current implementation doesn't use that grouping anyway (it puts both groups into one list) and relies on re-checking this in getEmptyNodesToRemove(). Do we expect new scale-down implementation to act differently?

If we drop this groupby we should also update this to cache nodes and not ntbr, since as far as I can tell that's the only place where we use ntbr.

I actually planned to avoid running the simulation twice in new scale down implementation, so the split done here would be final. We can get it wrong, and sometimes already do, which actually proves re-checking doesn't solve the problem, just reduces the chance of a race condition. The proper solution, I think, is to first apply the taints and only then look at the state of the cluster to determine which ones are empty. This was discussed and should be done in #3886. If you disagree we can discuss it there.

x13n · 2022-09-30T14:24:03Z

/hold cancel

MaciekPytel · 2022-10-14T17:57:59Z

cluster-autoscaler/core/scaledown/unneeded/nodes.go

+}
+
+type node struct {
+	ntbr  simulator.NodeToBeRemoved


After thinking about this some more, I have to say I don't really like the idea of caching this. We end up caching ntbr across CA loops, but I think the data it contains should not be considered valid outside of the lifetime of the ClusterSnapshot used in the simulation (ie. it's local to CA loop). The most obvious example is PodsToReschedule which can definitely change during the time in between loops. Even more importantly ClusterSnapshot is subject to various possible in-memory modifications (ex. FilterOutSchedulable or any custom shenanigans in PodListProcessor) that can be reflected in PodsToReschedule and the liftetime of those in-memory changes definitely should not be assumed to extend past a single loop.

I realize we expect to always call Update() and RemovableAt() in the same loop and the only thing we really cache between loops is 'since', so in practice it shouldn't break anything. But the fact that we rely on an unwritten assumption that we ignore cached ntbr outside the loop it was cached is not great.
TBH I don't think ntbr should be cached at all. It's never valid outside of a single simulation context, so to me it makes much more sense to just pass it around as a variable with a clearly limited lifetime.

I think this ties back to the problem I had before with RemovableAt grouping nodes into empty / non-empty. After discussing it with you, I agree it will work, but I just think it's completely unrelated to everything else this package is doing and I think limiting the package scope would be a better design choice. The fact that we're mixing responsibilities leads to secondary issues like the inconsistent cache lifetime I'm now complaining about.

MaciekPytel · 2022-10-14T18:08:05Z

cluster-autoscaler/core/scaledown/unneeded/nodes.go

+	nodeGroupSize := utils.GetNodeGroupSizeMap(context.CloudProvider)
+	for nodeName, v := range n.byName {
+		klog.V(2).Infof("%s was unneeded for %s", nodeName, ts.Sub(v.since).String())
+		node := v.ntbr.Node


This is really the same comment as the one about cache above. I think this will work in this particular implementation, but I also think caching nodeName and getting a fresh Node from ClusterSnapshot is inherently a safer option than caching Nodes.

MaciekPytel · 2022-10-14T18:19:10Z

cluster-autoscaler/core/scaledown/unneeded/nodes_test.go

+limitations under the License.
+*/
+
+package unneeded


I think a unittest for RemovableAt / unremovableReason would be a nice bonus. The logic in unremovableReason seems much more testable now. I realize this PR is just moving existing code around, so I'm not going to block on this, but if you decide to make more changes in this PR maybe that could be an extra target of opportunity.

MaciekPytel · 2022-10-14T18:22:30Z

I think this will work and I'm going to block it anymore, but I don't really like this refactor for the reasons I listed in comments above. I'm leaving hold so you can have a chance to read my comments and decide if you want to address them or keep the current version. Please feel free to remove the hold.

/lgtm
/approve
/hold

k8s-ci-robot · 2022-10-14T18:22:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

x13n · 2022-10-17T18:51:04Z

I understand where the concern is coming from, but I disagree caching information is wrong in this case. Update / RemovableAt semantics are well defined: Update updates Nodes object with the last known state of the world, while RemovableAt returns the list of nodes that are removable at a certain timestamp, based on past Update calls. This is going to hold in any implementation that understands such semantics. I don't think Nodes object should be aware of autoscaler loops, pods or nodes being filtered out or anything that happens beyond this object. It's the other way around: if someone is trying to use Nodes anywhere in the code, they need to make sure they are using it properly. Omitting an Update call is not a problem of the object: it is a problem of the caller: if they didn't provide fresh information, the list of removable nodes may be inaccurate. Note that it will carries the risk of being inaccurate regardless of when the call happened: CA is making in-memory decisions based on snapshotted state of a distributed system. For its decisions to be accurate, we would have to freeze all changes in the system during the decision making, which is infeasible, for obvious reasons. That being said, I agree the code could be improved (it always can), but I'd like to merge it as is: it is a strict improvement over the previous state, it already enables further changes to implement parallel drain and it has been hanging for over 1.5 month now.

/hold cancel

Extract criteria for removing unneded nodes to a separate package

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 30, 2022

k8s-ci-robot requested review from aleksandra-malinowska and feiskyer August 30, 2022 09:54

k8s-ci-robot assigned MaciekPytel Aug 30, 2022

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2022

k8s-ci-robot assigned yaroslava-serdiuk Aug 30, 2022

x13n changed the title ~~Scaledown4~~ Extract criteria for removing unneded nodes to a separate package Aug 30, 2022

x13n force-pushed the scaledown4 branch 2 times, most recently from 95de888 to d269e39 Compare September 1, 2022 13:21

jbartosik added the area/cluster-autoscaler label Sep 2, 2022

x13n mentioned this pull request Sep 2, 2022

Extract scheduling hints to a dedicated object #5161

Merged

x13n force-pushed the scaledown4 branch 2 times, most recently from 080e48c to b4925ea Compare September 16, 2022 10:55

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 16, 2022

MaciekPytel reviewed Sep 23, 2022

View reviewed changes

Drop unused maps

07b531c

x13n force-pushed the scaledown4 branch from b4925ea to 97ca042 Compare September 26, 2022 13:53

Extract criteria for removing unneded nodes to a separate package

3a3ec38

x13n force-pushed the scaledown4 branch from 97ca042 to 3a3ec38 Compare September 26, 2022 14:49

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2022

MaciekPytel reviewed Oct 14, 2022

View reviewed changes

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 14, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2022

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2022

k8s-ci-robot merged commit f445a6a into kubernetes:master Oct 17, 2022

navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022

Merge pull request kubernetes#5147 from x13n/scaledown4

01e2a66

Extract criteria for removing unneded nodes to a separate package

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract criteria for removing unneded nodes to a separate package #5147

Extract criteria for removing unneded nodes to a separate package #5147

x13n commented Aug 30, 2022 •

edited

Loading

x13n commented Aug 30, 2022

MaciekPytel Sep 23, 2022

x13n Sep 26, 2022

MaciekPytel Sep 23, 2022

x13n Sep 26, 2022

x13n commented Sep 30, 2022

MaciekPytel Oct 14, 2022 •

edited

Loading

MaciekPytel Oct 14, 2022

MaciekPytel Oct 14, 2022

MaciekPytel commented Oct 14, 2022

k8s-ci-robot commented Oct 14, 2022

x13n commented Oct 17, 2022

Extract criteria for removing unneded nodes to a separate package #5147

Extract criteria for removing unneded nodes to a separate package #5147

Conversation

x13n commented Aug 30, 2022 • edited Loading

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

x13n commented Aug 30, 2022

MaciekPytel Sep 23, 2022

Choose a reason for hiding this comment

x13n Sep 26, 2022

Choose a reason for hiding this comment

MaciekPytel Sep 23, 2022

Choose a reason for hiding this comment

x13n Sep 26, 2022

Choose a reason for hiding this comment

x13n commented Sep 30, 2022

MaciekPytel Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

MaciekPytel Oct 14, 2022

Choose a reason for hiding this comment

MaciekPytel Oct 14, 2022

Choose a reason for hiding this comment

MaciekPytel commented Oct 14, 2022

k8s-ci-robot commented Oct 14, 2022

x13n commented Oct 17, 2022

x13n commented Aug 30, 2022 •

edited

Loading

MaciekPytel Oct 14, 2022 •

edited

Loading