Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(networking): bump aws-vpc-cni version to 1.15.5 #16191

Closed
wants to merge 2 commits into from

Conversation

moshevayner
Copy link
Member

What this PR does / why we need it:
Bump aws cni to version 1.15.5.
Once 1.29 release branch is created, I'll bump master to 1.16.0 which was released a couple of days ago, to allow an additional buffer between the versions, if that makes sense.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer:
https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.15.5

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 23, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johngmyers for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@moshevayner
Copy link
Member Author

/retest

1 similar comment
@moshevayner
Copy link
Member Author

/retest

@moshevayner moshevayner marked this pull request as draft December 23, 2023 16:24
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2023
@moshevayner moshevayner marked this pull request as ready for review December 23, 2023 16:26
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2023
@moshevayner
Copy link
Member Author

/retest

@moshevayner moshevayner marked this pull request as draft December 23, 2023 17:05
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2023
@hakman
Copy link
Member

hakman commented Dec 23, 2023

Please ignore pull-kops-e2e-aws-upgrade-k127-ko127-to-klatest-kolatest-many-addons.
/test pull-kops-e2e-cni-amazonvpc
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 23, 2023
@moshevayner moshevayner marked this pull request as ready for review December 23, 2023 18:37
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2023
@moshevayner
Copy link
Member Author

@hakman I can't seem to figure out from the logs why is the amazonvpc job failing.
Trying to spin up a cluster on my personal account using a build from this branch to see if it comes up.
Converting to draft for the time being.

@moshevayner moshevayner marked this pull request as draft December 23, 2023 19:43
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2023
@moshevayner
Copy link
Member Author

So, I have a feeling something with the newly introduced network policy support is causing a cascading failure that begins with (probably) coredns failing:

 k -n kube-system logs coredns-867c995cd5-d9gxs
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
.:53
[INFO] plugin/reload: Running configuration SHA512 = 9617ae84604c33431b2bf5a8d8e93450b34eb11d1103af1b1962d4d016b8eb111bde503da621c2bf37a233cfe70c25b727b74f78133b9bcf4b6191e897f96fa8
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:44830->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:33291->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:43286->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:41628->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:55739->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:48107->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:55021->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:37836->172.20.0.2:53: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://100.64.0.1:443/version": dial tcp 100.64.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:52611->172.20.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 7001883994134103369.583922312319829419. HINFO: read udp 172.20.66.221:59047->172.20.0.2:53: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://100.64.0.1:443/version": dial tcp 100.64.0.1:443: i/o timeout
[WARNING] plugin/kubernetes: Kubernetes API connection failure: Get "https://100.64.0.1:443/version": dial tcp 100.64.0.1:443: i/o timeout
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

Seems like it can't access the k8s api, and from there it all goes south.

It's weird, because according to the docs, the enforcements of netpols is disabled by default, but there might be something else that is eluding me at this point.

I'll have to keep digging and understand what's the root cause.
Keeping this here for extra transparency.

@moshevayner
Copy link
Member Author

@hakman any chance this is related? aws/amazon-vpc-cni-k8s#2103
I just found it there. It seems to be over a year old, but unless I'm mistaken, we are now using Ubuntu 2204 by default, and coredns uses the kubernetes ClusterIP service to interact with the API.
WDYT?
cc @olemarkus

@hakman
Copy link
Member

hakman commented Dec 24, 2023

/test pull-kops-e2e-cni-amazonvpc

@hakman
Copy link
Member

hakman commented Dec 24, 2023

@hakman any chance this is related? aws/amazon-vpc-cni-k8s#2103 I just found it there. It seems to be over a year old, but unless I'm mistaken, we are now using Ubuntu 2204 by default, and coredns uses the kubernetes ClusterIP service to interact with the API. WDYT?

Due to that issue, we run AWS VPC CNI tests with Ubuntu 20.04. From what I see, failures look like flakes in the tests, but I may be wrong.

@hakman
Copy link
Member

hakman commented Dec 24, 2023

/test pull-kops-e2e-cni-amazonvpc

1 similar comment
@hakman
Copy link
Member

hakman commented Dec 24, 2023

/test pull-kops-e2e-cni-amazonvpc

@k8s-ci-robot
Copy link
Contributor

@moshevayner: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-aws-upgrade-k127-ko127-to-klatest-kolatest-many-addons eda667f link false /test pull-kops-e2e-aws-upgrade-k127-ko127-to-klatest-kolatest-many-addons
pull-kops-e2e-cni-amazonvpc eda667f link false /test pull-kops-e2e-cni-amazonvpc

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@moshevayner
Copy link
Member Author

@hakman any chance this is related? aws/amazon-vpc-cni-k8s#2103 I just found it there. It seems to be over a year old, but unless I'm mistaken, we are now using Ubuntu 2204 by default, and coredns uses the kubernetes ClusterIP service to interact with the API. WDYT?

Due to that issue, we run AWS VPC CNI tests with Ubuntu 20.04. From what I see, failures look like flakes in the tests, but I may be wrong.

I think these are legit errors, based on this comment (just making sure you saw it, since it might've been lost in the pile of comments here).

I created a cluster locally using a binary built from this branch, and got the mentioned results.

@hakman
Copy link
Member

hakman commented Dec 24, 2023

@hakman any chance this is related? aws/amazon-vpc-cni-k8s#2103 I just found it there. It seems to be over a year old, but unless I'm mistaken, we are now using Ubuntu 2204 by default, and coredns uses the kubernetes ClusterIP service to interact with the API. WDYT?

Due to that issue, we run AWS VPC CNI tests with Ubuntu 20.04. From what I see, failures look like flakes in the tests, but I may be wrong.

I think these are legit errors, based on this comment (just making sure you saw it, since it might've been lost in the pile of comments here).

I created a cluster locally using a binary built from this branch, and got the mentioned results.

I think you are right 😄...

@Deshke
Copy link

Deshke commented Jan 16, 2024

via https://github.com/awslabs/amazon-eks-ami/blob/master/scripts/install-worker.sh#L104

  # Temporary fix for https://github.com/aws/amazon-vpc-cni-k8s/pull/2118
  sudo sed -i "s/^MACAddressPolicy=.*/MACAddressPolicy=none/" /usr/lib/systemd/network/99-default.link || true

how could this be added?

@hakman
Copy link
Member

hakman commented Jan 16, 2024

via https://github.com/awslabs/amazon-eks-ami/blob/master/scripts/install-worker.sh#L104

  # Temporary fix for https://github.com/aws/amazon-vpc-cni-k8s/pull/2118
  sudo sed -i "s/^MACAddressPolicy=.*/MACAddressPolicy=none/" /usr/lib/systemd/network/99-default.link || true

how could this be added?

Please create a separate issue for this.

@hakman
Copy link
Member

hakman commented Jan 28, 2024

@moshevayner could we try updating to 1.16.2?

@moshevayner
Copy link
Member Author

@moshevayner could we try updating to 1.16.2?

Sure, I'll give that a try asap and update!

@moshevayner
Copy link
Member Author

This PR is preceded by #16297

/close

@k8s-ci-robot
Copy link
Contributor

@moshevayner: Closed this PR.

In response to this:

This PR is preceded by #16297

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/addons cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants