Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Networking ENIs fail to attach in v1.13.0 when no security groups are specified #2426

Closed
jdn5126 opened this issue Jun 15, 2023 · 4 comments · Fixed by #2475
Closed
Labels

Comments

@jdn5126
Copy link
Contributor

jdn5126 commented Jun 15, 2023

What happened:
In v1.13.0, #2354 was introduced, which improved custom networking startup time by attaching custom ENIs during node initialization. This introduced an issue, though, when no security group was associated with the ENIConfig object. In this case, we should fall back to the security group assigned to the node's primary ENI.

The security group assigned to the node's primary ENI comes from the EC2 metadata cache, though, and this value is not synced until after node initialization: https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.13.0/pkg/ipamd/ipamd.go#L450

Our retry strategy in v1.13.0 is to let the aws-node pod crash so that the node does not become ready for pods to be scheduled. When the aws-node pod restarts, we hit the same issue, as the cache never syncs the primary ENI's security group ID until after node initialization.

Attach logs

{"level":"info","ts":"2023-06-15T21:55:31.095Z","caller":"ipamd/ipamd.go:903","msg":"Found ENI Config Name: us-west-2b"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"ipamd/ipamd.go:873","msg":"ipamd: using custom network config: [], subnet-07f48a842d5d33cee"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"No custom networking security group found, will use the node's primary ENI's SG: []"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"Creating ENI with security groups: [] in subnet: subnet-07f48a842d5d33cee"}
{"level":"error","ts":"2023-06-15T21:55:31.299Z","caller":"awsutils/awsutils.go:763","msg":"Failed to CreateNetworkInterface InvalidParameterValue: user [REDACTED] does not own a resource\n\tstatus code: 400, request id: e12e797e-19f5-47fa-8d8e-4a5de5a677b7"}

What you expected to happen:
Custom ENIs should be attached during node initialization, and there should be a better retry strategy.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy VPC CNI v1.13.0
  2. Create an ENIConfig with no security groups
  3. See error

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): does not matter
  • CNI Version: v1.13.0
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): does not matter
@jdn5126 jdn5126 added the bug label Jun 15, 2023
@jdn5126
Copy link
Contributor Author

jdn5126 commented Jun 15, 2023

The workaround for this issue in v1.13.0 is to specify any security group ID when defining an ENIConfig. It is recommended that you choose a no-op security group, like the cluster security group or VPC default security group ID.

@cparik
Copy link

cparik commented Jun 16, 2023

+1
Observed the same issue. The workaround helps to get the nodes to join the cluster.

@jdn5126
Copy link
Contributor Author

jdn5126 commented Jun 19, 2023

Closing as this is fixed in v1.13.2 release

@jdn5126 jdn5126 closed this as completed Jun 19, 2023
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants