Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Centralize leaked ENI cleanup #374

Merged
merged 4 commits into from
Mar 13, 2024
Merged

Conversation

sushrk
Copy link
Contributor

@sushrk sushrk commented Feb 23, 2024

Issue #, if available:
N/A
Description of changes:
Centralize leaked ENI cleanup in the controller to delete leaked ENIs provisioned by the controller and VPC-CNI:

  • Added finalizer in CNINode: when kubernetes deletes an object that has finalizers specified for it, the Kubernetes API marks the object for deletion by populating .metadata.deletionTimestamp, and returns a 202 status code (HTTP "Accepted"). The target object remains in a terminating state while the control plane, or other components, take the actions defined by the finalizers. After these actions are complete, the controller removes the relevant finalizers from the target object. When the metadata.finalizers field is empty, Kubernetes considers the deletion complete and deletes the object.
  • Add a new field in Spec to add Tag: This is currently used to specify tags that need to added to network interfaces provisioned by the controller and VPC-CNI. VPC-CNI will read the CNINode to get the tags.

The CNINode reconciler will add the finalizer and tag if missing on existing CNINode objects and handles the finalizer routine to clean up leaked resources.

  • Re-create CNINode if deleted by user: The controller will re-create CNINode if deleted by user when node object is still present.
    If failed to create, node event will be published:
  Warning  CNINodeDeleted                  2m7s  vpc-resource-controller  CNINode was deleted and could not be re-created by controller
  • Removed deleting branch ENIs on node termination
  • Added metrics to count number of available ENIs found and number of leaked ENIs that failed to be deleted
  • Added metrics to count the number of times controller needed to recreate CNINode
    Tests:
SGPP test suite

Ran 19 of 23 Specs in 923.345 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 4 Skipped
PASS

Ginkgo ran 1 suite in 15m25.59382642s
Test Suite Passed
  • Verified CNINode finalizers and tags are added(if missing) to existing CNINode objects
  • Verified finalizer routine is called when CNINode object is deleted at node termination
  • Verified CNINode is re-created when deleted if the node object exists
  • Verified new metrics are registered on the controller

Note, PR for the test suite is pending until the changes are merged in.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

main.go Show resolved Hide resolved
pkg/node/manager/manager.go Outdated Show resolved Hide resolved
pkg/config/type.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@sushrk sushrk marked this pull request as ready for review March 12, 2024 07:59
@sushrk sushrk requested a review from a team as a code owner March 12, 2024 07:59
controllers/crds/cninode_controller.go Show resolved Hide resolved
controllers/crds/cninode_controller.go Outdated Show resolved Hide resolved
controllers/crds/cninode_controller.go Outdated Show resolved Hide resolved
@sushrk
Copy link
Contributor Author

sushrk commented Mar 13, 2024

Had a sync offline, and addressed all the comments in subsequent commits. PTAL, thanks.

Copy link
Contributor

@haouc haouc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sushrk sushrk merged commit e3c6c05 into aws:eni-cleanup Mar 13, 2024
2 of 3 checks passed
sushrk added a commit to sushrk/amazon-vpc-resource-controller-k8s that referenced this pull request Apr 29, 2024
* feat: centralized eni cleanup
sushrk added a commit to sushrk/amazon-vpc-resource-controller-k8s that referenced this pull request Apr 30, 2024
* feat: centralized eni cleanup
sushrk added a commit that referenced this pull request May 1, 2024
* Call DisassociateTrunkInterface before deleting branch ENI (#372)

* Call DisassociateTrunkInterface before deleting branch ENI

* feat: Centralize leaked ENI cleanup (#374)

* feat: centralized eni cleanup

* Merge master into eni-cleanup (#385)

* fix: paginate DescribeNetworkInterfaces with deep filters (#375)

* fix: paginate DescribeNetworkInterfaces with deep filters

* update metrics and address review comments

* minor updates to address comments

* Bump github.com/aws/aws-sdk-go from 1.49.13 to 1.50.29 (#380)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.49.13 to 1.50.29.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.49.13...v1.50.29)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump k8s.io/client-go from 0.29.1 to 0.29.2 (#377)

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.29.1 to 0.29.2.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.29.1...v0.29.2)

---
updated-dependencies:
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/prometheus/common from 0.46.0 to 0.49.0 (#378)

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.46.0 to 0.49.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.46.0...v0.49.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Repo controlled build go version (#381)

* update golang version (#383)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jason Du <jasonxdu@amazon.com>

* fix:update cluster tag name in CNINode (#386)

* fix:add node OS label in CNINode, retry get CNINode with backoff

* update protobuf to 1.33.0 (#387)

* add CNINode integration tests (#391)

* use DescribeNetworkInterfaces with deep filters

* add integration test to validate ec2 permissions

* remove DisassociateAllBranchENIs as it is not useful (#400)

* remove DisassociateAllBranchENIs as it is not useful

* skip deletion success log for NotFound ENI

* fix govulncheck

* Merge master branch into eni-cleanup (#416)

* fix: paginate DescribeNetworkInterfaces with deep filters (#375)

* fix: paginate DescribeNetworkInterfaces with deep filters

* update metrics and address review comments

* minor updates to address comments

* Bump github.com/aws/aws-sdk-go from 1.49.13 to 1.50.29 (#380)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.49.13 to 1.50.29.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.49.13...v1.50.29)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump k8s.io/client-go from 0.29.1 to 0.29.2 (#377)

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.29.1 to 0.29.2.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.29.1...v0.29.2)

---
updated-dependencies:
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/prometheus/common from 0.46.0 to 0.49.0 (#378)

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.46.0 to 0.49.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.46.0...v0.49.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Repo controlled build go version (#381)

* update golang version (#383)

* update protobuf to 1.33.0 (#387)

* pin envtest version due to an upstream bug (#390)

* Bump k8s.io/client-go from 0.29.2 to 0.29.3 (#392)

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.29.2 to 0.29.3.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](kubernetes/client-go@v0.29.2...v0.29.3)

---
updated-dependencies:
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/aws/amazon-vpc-cni-k8s from 1.16.0 to 1.17.1 (#393)

Bumps [github.com/aws/amazon-vpc-cni-k8s](https://github.com/aws/amazon-vpc-cni-k8s) from 1.16.0 to 1.17.1.
- [Release notes](https://github.com/aws/amazon-vpc-cni-k8s/releases)
- [Changelog](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/CHANGELOG.md)
- [Commits](aws/amazon-vpc-cni-k8s@v1.16.0...v1.17.1)

---
updated-dependencies:
- dependency-name: github.com/aws/amazon-vpc-cni-k8s
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/prometheus/common from 0.49.0 to 0.51.1 (#395)

Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.49.0 to 0.51.1.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.49.0...v0.51.1)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/aws/aws-sdk-go from 1.50.29 to 1.51.12 (#397)

Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.50.29 to 1.51.12.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Commits](aws/aws-sdk-go@v1.50.29...v1.51.12)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add github action to run gosec static analysis (#398)

* add github action to run gosec static analysis

* install gosec

* update golang and dependency to fix CVE (#401)

* revert pagination and call DescribeNetworkInterfaces with vpcID or subnetID filter

* Revert "fix: paginate DescribeNetworkInterfaces with deep filters (#375)"

This reverts commit b5699de.

* call DescribeNetworkInterfaces with vpcID or subnetID filter

* update EC2 supported instance types (#402)

* remove global exclusion for G108,G114 and add nosec in code (#404)

* Update controller_auth_proxy_patch.yaml (#405)

Update the reference from gcr.io to registry.k8s.io

>  kube-rbac-proxy is moving to registry.k8s.io/kubebuilder/kube-rbac-proxy (from gcr.io/kubebuilder/kube-rbac-proxy) because GCR is being sunset. We need to update these references.

* Fix log which causes panic (#407)

* Fix log which causes panic

* Consistent key name

* consistent naming

* run go mod tidy

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jason Du <jasonxdu@amazon.com>
Co-authored-by: Hao Zhou <haouc@users.noreply.github.com>
Co-authored-by: Senthil Kumaran <senthilx@amazon.com>
Co-authored-by: Garvin Pang <garvinpang@protonmail.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jason Du <jasonxdu@amazon.com>
Co-authored-by: Hao Zhou <haouc@users.noreply.github.com>
Co-authored-by: Senthil Kumaran <senthilx@amazon.com>
Co-authored-by: Garvin Pang <garvinpang@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants