Add byobject filter on nodes #2888

GnatorX · 2024-04-18T20:52:02Z

What type of PR is this?

improvement

Which issue does this PR fix?:

What does this PR do / Why do we need it?:
This PR adds a filter to reduce the number of node object VPC CNI watches for to just the node it is managing.

Testing done on this change:

Tested on our cluster of 3-4k node by @dl-stripe

Will this PR introduce any new dependencies?:

No
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
No.

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

orsenthil · 2024-06-26T20:09:36Z

Closing per the discussion here - #2887

dl-stripe · 2024-10-10T21:02:42Z

A couple more datapoints in favor of this change. We have a few large node count clusters with a higher rate of churn for these nodes. The number of Node events pushed to the daemonset was causing excesive bandwidth usage. Adding the filter drastically reduced the number of events distributed to the system. Similarly the number of bytes pushed fell drastically (sharing a screenshot without absolute numbers, happy to share in private, but you can see the relative drop in bytes processed).

sum by (cluster, kind) (rate(apiserver_watch_events_total{kind="Node"}[5m]))

jayanthvn · 2024-10-11T01:51:52Z

@GnatorX - The PR is still in draft mode. Please feel free to move to review. Also can you run make format ?

GnatorX · 2024-10-11T03:02:25Z

ya i can do that

GnatorX · 2024-10-11T03:06:24Z

ran make format

jayanthvn · 2024-10-21T21:26:23Z

Updated the branch and started the test workflow..

GnatorX · 2024-10-21T21:59:23Z

Thanks

orsenthil · 2024-10-21T22:36:47Z

pkg/k8sapi/k8sutils.go

@@ -37,7 +37,11 @@ func getIPAMDCacheFilters() map[client.Object]cache.ByObject {
 		return map[client.Object]cache.ByObject{
 			&corev1.Pod{}: {
 				Field: fields.Set{"spec.nodeName": nodeName}.AsSelector(),
-			}}
+			},


The previous two related to changes to the cache improvements were these.

https://github.com/aws/amazon-vpc-cni-k8s/pull/1419/files#diff-f48443f100e2f9072cbb1924ed0a9090831ade5986d10535641ad5c0a63f51b4

ea10123#diff-f48443f100e2f9072cbb1924ed0a9090831ade5986d10535641ad5c0a63f51b4

I am trying to understand why the only Pod{} to the filter here in the first place.

I suspect it wasn't as obvious because this requires large clusters to show the increase in memory and networking bandwidth. We only saw this because our cluster is 4k node count

orsenthil · 2024-10-21T22:37:34Z

@dl-stripe - What is the LHS value in the first graph here - #2888 (comment)

dl-stripe · 2024-10-21T22:44:46Z

@dl-stripe - What is the LHS value in the first graph here - #2888 (comment)

It's the number of Node events pushed through informers (based off apiserver_watch_events_total) over a 5 minute period for 3 sample clusters we run the AWS VPC CNI in. They're all large clusters but have slightly different node counts.

orsenthil · 2024-10-21T23:29:49Z

When we were using node labels to signal security group for pods feature - https://docs.aws.amazon.com/eks/latest/userguide/security-groups-pods-deployment.html

this change for letting kubernetes client cache the node calls, could have had a impact, and dependent on the cache invalidation by the client when the labels change. Right now since, we don't depend on the CNI node labels, this change looks good to me.

@dl-stripe / @GnatorX - Do you have Security Groups for Pods in your cluster under test and did you see any impact this change?

GnatorX · 2024-10-21T23:39:34Z

this change for letting kubernetes client cache the node calls, could have had a impact, and dependent on the cache invalidation by the client when the labels change. Right now since, we don't depend on the CNI node labels, this change looks good to me.

This is filter down the call for nodes when informer performs the list + watch. It shouldn't affect any label changes. It should narrow the informer cache to just watch for the node the VPC CNI is running on and ignore all other node's updates and events

GnatorX · 2024-10-21T23:40:06Z

We do run with Security Groups for Pods and this change has no effect on it.

orsenthil · 2024-10-21T23:52:05Z

This is filter down the call for nodes when informer performs the list + watch. It shouldn't affect any label changes. It should narrow the informer cache to just watch for the node the VPC CNI is running on and ignore all other node's updates and events

Got it. This makes sense, and the change looks correct and helpful too.

orsenthil

This change LGTM.

The filtermap used here -

amazon-vpc-cni-k8s/pkg/k8sapi/k8sutils.go

Line 68 in d49c8a3

ByObject: filterMap,

and this will add a filter to list/watch events for node operations, restricting it to node that CNI runs on.

This correct.

I understand adding an additional cache field can increase the memory usage by VPC CNI, but it did not per here - #2887

We will verify memory usage VPC CNI after this change before the release of this change.

GnatorX · 2024-10-22T16:32:35Z

The reason initially I wasn't able to get the memory improvement from this change is because our internal setup vendored the code, so when I made the change it didn't actually take effect.
At a high level what is happening here is we are reducing the amount of data VPC CNI's informer would process and save.
https://github.com/kubernetes/sample-controller/raw/master/docs/images/client-go-controller-interaction.jpeg
The field selector changes the reflector's list & watch to just the node it is running on. This isn't a change to an additional cache field, it is a filter based on the selector field.
This changes 2 things.

We no longer get watch events from other nodes this reduces network calls and CPU processing
We no longer cache all node information in the cluster within the store. This reduce memory consumption

GnatorX · 2024-10-22T16:49:51Z

https://github.com/kubernetes-sigs/controller-runtime/blob/main/designs/use-selectors-at-cache.md
https://github.com/kubernetes-sigs/controller-runtime/blob/main/designs/cache_options.md

Some docs I found that explains what these ByObject filters do

orsenthil · 2024-10-23T01:14:56Z

Thank you for the explanation and the links to behavior. This change looks good and helpful.

* Add byobject filter on nodes --------- Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Garvin Pang <garvinp@stripe.com>

* Add byobject filter on nodes --------- Co-authored-by: Garvin Pang <garvinpang@protonmail.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Garvin Pang <garvinp@stripe.com>

sidewinder12s · 2024-11-20T03:04:54Z

This is a great change/any other calls to API Server or AWS APIs that could use filtering should use it. This cut the memory usage of a 4-500 node cluster in half from 40GB+ to ~20GB across all cni pods.

Add byobject filter on nodes

c130709

GnatorX closed this Apr 29, 2024

GnatorX reopened this Apr 29, 2024

orsenthil closed this Jun 26, 2024

GnatorX mentioned this pull request Oct 10, 2024

Improve VPC CNI memory by reducing number of things it is caching #2887

Closed

jayanthvn reopened this Oct 11, 2024

jayanthvn requested a review from orsenthil October 11, 2024 01:50

Merge branch 'master' into patch-1

81c3e27

GnatorX marked this pull request as ready for review October 11, 2024 03:01

GnatorX requested a review from a team as a code owner October 11, 2024 03:01

format

1f2d6fc

GnatorX and others added 4 commits October 11, 2024 15:20

Merge branch 'master' into patch-1

c113c43

Merge branch 'master' into patch-1

66e23e2

Merge branch 'master' into patch-1

9ac7d0e

Merge branch 'master' into patch-1

8fc2e8d

orsenthil reviewed Oct 21, 2024

View reviewed changes

orsenthil approved these changes Oct 22, 2024

View reviewed changes

orsenthil merged commit 0703d03 into aws:master Oct 23, 2024
6 checks passed

orsenthil pushed a commit that referenced this pull request Oct 23, 2024

Add byobject filter on nodes (#2888)

4aebcac

* Add byobject filter on nodes --------- Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Garvin Pang <garvinp@stripe.com>

GnatorX deleted the patch-1 branch October 29, 2024 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add byobject filter on nodes #2888

Add byobject filter on nodes #2888

GnatorX commented Apr 18, 2024 •

edited

Loading

orsenthil commented Jun 26, 2024

dl-stripe commented Oct 10, 2024

jayanthvn commented Oct 11, 2024

GnatorX commented Oct 11, 2024

GnatorX commented Oct 11, 2024

jayanthvn commented Oct 21, 2024

GnatorX commented Oct 21, 2024

orsenthil Oct 21, 2024

GnatorX Oct 22, 2024

orsenthil commented Oct 21, 2024

dl-stripe commented Oct 21, 2024

orsenthil commented Oct 21, 2024

GnatorX commented Oct 21, 2024 •

edited

Loading

GnatorX commented Oct 21, 2024 •

edited

Loading

orsenthil commented Oct 21, 2024 •

edited

Loading

orsenthil left a comment

GnatorX commented Oct 22, 2024 •

edited

Loading

GnatorX commented Oct 22, 2024 •

edited

Loading

orsenthil commented Oct 23, 2024

sidewinder12s commented Nov 20, 2024

Add byobject filter on nodes #2888

Add byobject filter on nodes #2888

Conversation

GnatorX commented Apr 18, 2024 • edited Loading

orsenthil commented Jun 26, 2024

dl-stripe commented Oct 10, 2024

jayanthvn commented Oct 11, 2024

GnatorX commented Oct 11, 2024

GnatorX commented Oct 11, 2024

jayanthvn commented Oct 21, 2024

GnatorX commented Oct 21, 2024

orsenthil Oct 21, 2024

Choose a reason for hiding this comment

GnatorX Oct 22, 2024

Choose a reason for hiding this comment

orsenthil commented Oct 21, 2024

dl-stripe commented Oct 21, 2024

orsenthil commented Oct 21, 2024

GnatorX commented Oct 21, 2024 • edited Loading

GnatorX commented Oct 21, 2024 • edited Loading

orsenthil commented Oct 21, 2024 • edited Loading

orsenthil left a comment

Choose a reason for hiding this comment

GnatorX commented Oct 22, 2024 • edited Loading

GnatorX commented Oct 22, 2024 • edited Loading

orsenthil commented Oct 23, 2024

sidewinder12s commented Nov 20, 2024

GnatorX commented Apr 18, 2024 •

edited

Loading

GnatorX commented Oct 21, 2024 •

edited

Loading

GnatorX commented Oct 21, 2024 •

edited

Loading

orsenthil commented Oct 21, 2024 •

edited

Loading

GnatorX commented Oct 22, 2024 •

edited

Loading

GnatorX commented Oct 22, 2024 •

edited

Loading