You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
In v1.13.0, #2354 was introduced, which improved custom networking startup time by attaching custom ENIs during node initialization. This introduced an issue, though, when no security group was associated with the ENIConfig object. In this case, we should fall back to the security group assigned to the node's primary ENI.
Our retry strategy in v1.13.0 is to let the aws-node pod crash so that the node does not become ready for pods to be scheduled. When the aws-node pod restarts, we hit the same issue, as the cache never syncs the primary ENI's security group ID until after node initialization.
Attach logs
{"level":"info","ts":"2023-06-15T21:55:31.095Z","caller":"ipamd/ipamd.go:903","msg":"Found ENI Config Name: us-west-2b"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"ipamd/ipamd.go:873","msg":"ipamd: using custom network config: [], subnet-07f48a842d5d33cee"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"No custom networking security group found, will use the node's primary ENI's SG: []"}
{"level":"info","ts":"2023-06-15T21:55:31.196Z","caller":"awsutils/awsutils.go:763","msg":"Creating ENI with security groups: [] in subnet: subnet-07f48a842d5d33cee"}
{"level":"error","ts":"2023-06-15T21:55:31.299Z","caller":"awsutils/awsutils.go:763","msg":"Failed to CreateNetworkInterface InvalidParameterValue: user [REDACTED] does not own a resource\n\tstatus code: 400, request id: e12e797e-19f5-47fa-8d8e-4a5de5a677b7"}
What you expected to happen:
Custom ENIs should be attached during node initialization, and there should be a better retry strategy.
How to reproduce it (as minimally and precisely as possible):
Deploy VPC CNI v1.13.0
Create an ENIConfig with no security groups
See error
Anything else we need to know?:
Environment:
Kubernetes version (use kubectl version): does not matter
CNI Version: v1.13.0
OS (e.g: cat /etc/os-release): Amazon Linux 2
Kernel (e.g. uname -a): does not matter
The text was updated successfully, but these errors were encountered:
The workaround for this issue in v1.13.0 is to specify any security group ID when defining an ENIConfig. It is recommended that you choose a no-op security group, like the cluster security group or VPC default security group ID.
Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
What happened:
In
v1.13.0
, #2354 was introduced, which improved custom networking startup time by attaching custom ENIs during node initialization. This introduced an issue, though, when no security group was associated with the ENIConfig object. In this case, we should fall back to the security group assigned to the node's primary ENI.The security group assigned to the node's primary ENI comes from the EC2 metadata cache, though, and this value is not synced until after node initialization: https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.13.0/pkg/ipamd/ipamd.go#L450
Our retry strategy in
v1.13.0
is to let theaws-node
pod crash so that the node does not become ready for pods to be scheduled. When theaws-node
pod restarts, we hit the same issue, as the cache never syncs the primary ENI's security group ID until after node initialization.Attach logs
What you expected to happen:
Custom ENIs should be attached during node initialization, and there should be a better retry strategy.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): does not mattercat /etc/os-release
): Amazon Linux 2uname -a
): does not matterThe text was updated successfully, but these errors were encountered: