You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{"level":"debug","ts":"2020-11-21T12:18:10.565Z","caller":"awsutils/awsutils.go:388","msg":"Update ENI eni-xxxxxxxxxx"}
{"level":"error","ts":"2020-11-21T12:18:10.773Z","caller":"aws-k8s-agent/main.go:28","msg":"Initialization failure: ipamd: can not initialize with AWS SDK interface: refreshSGIDs: unable to update the ENI's SG: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-xxxxxxxxxxxx' does not exist\n\tstatus code: 400, request id: aaaaaa-bbbbbb-cccccc-ccccc-sssssss"}
From awsutils l458 We understand ModifyNetworkInterfaceAttribute action is triggering the above error message.
From the Cloud trail API calls I was able to confirm that the ENI was created and deleted by VPC cni however. They were not cleared from EC2 metadata. Since metadataMACPath function will use the below snippet to make the metadata call which will gather the eni-id We are seeing the above issue.
IPAMD should be able to handle out-of-synced IMDS in above cases. As This will cause aws-node to be in crashloopback state and cause pods scheduling onto the node to not have any ip's causing them to be stuck in creating state.
@falgofrancis - #1341 will avoid aws-node crash on boot up if the metadata has stale data and also a counter is added which keeps track of this. This is just a part of the fix. We are tracking this with EC2 team for the race condition which is causing IMDS to never sync. Current work around is to drain the node.
IPAMD is restarting with the below messages.
From awsutils l458 We understand ModifyNetworkInterfaceAttribute action is triggering the above error message.
From the Cloud trail API calls I was able to confirm that the ENI was created and deleted by VPC cni however. They were not cleared from EC2 metadata. Since metadataMACPath function will use the below snippet to make the metadata call which will gather the eni-id We are seeing the above issue.
I have confirmed by running the below curl call that the deleted ENI is still present on the metadata.
Summary
IPAMD should be able to handle out-of-synced IMDS in above cases. As This will cause aws-node to be in crashloopback state and cause pods scheduling onto the node to not have any ip's causing them to be stuck in creating state.
similar issue form the past #1177
Why is this needed:
CNI should handle similar issue related to imds gracefully instead of getting the node and all pods stuck.
The text was updated successfully, but these errors were encountered: