-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle stale IMDS metadata for secondary IPs #1177
Conversation
7acdd38
to
8923096
Compare
if err != nil { | ||
// Failed to set up the ENI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you planning to test this? One way to test this would be add a sleep before netLink.LinkSetUp
and delete the ENI and make sure eniConfig doesn't have this ENI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all we should add a unit test for it.
For manual testing, deleting the ENI and check the logs and introspection endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you need to delete ENI in such a way that setupENI calls fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added debug code:
if rand.Intn(2) == 0 {
log.Debug("XXXX Failing randomly!")
return errors.New("random linkByMac failure injected")
}
Output:
{"level":"debug","ts":"2020-08-31T21:29:50.258Z","caller":"ipamd/ipamd.go:476","msg":"Reconcile existing ENI eni-01d689e8a83c6feba IP pool"}
{"level":"debug","ts":"2020-08-31T21:29:50.258Z","caller":"ipamd/ipamd.go:476","msg":"XXXXX Deleted the last IP from attachedENI: %+v"}
{"level":"warn","ts":"2020-08-31T21:29:50.258Z","caller":"ipamd/ipamd.go:968","msg":"Instance metadata does not match data store! ipPool: [192.168.159.31 192.168.134.205 192.168.152.224 192.168.134.35 192.168.129.132 192.168.134.155 192.168.149.75 192.168.128.237 192.168.150.15 192.168.147.91 192.168.156.197 192.168.131.24 192.168.133.50 192.168.152.119], metadata: [{\n Primary: true,\n PrivateIpAddress: \"192.168.131.160\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.149.75\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.205\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.128.237\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.150.15\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.152.224\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.35\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.129.132\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.156.197\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.131.24\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.155\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.147.91\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.159.31\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.133.50\"\n}]"}
{"level":"debug","ts":"2020-08-31T21:29:50.258Z","caller":"ipamd/ipamd.go:968","msg":"We need to check the ENI status by calling the EC2 control plane."}
{"level":"debug","ts":"2020-08-31T21:29:50.352Z","caller":"ipamd/ipamd.go:1026","msg":"Reconcile and skip primary IP 192.168.131.160 on ENI eni-01d689e8a83c6feba"}
{"level":"debug","ts":"2020-08-31T21:29:50.352Z","caller":"ipamd/ipamd.go:476","msg":"Reconcile existing ENI eni-074c28aba8340f164 IP pool"}
{"level":"debug","ts":"2020-08-31T21:29:50.352Z","caller":"ipamd/ipamd.go:476","msg":"XXXXX Deleted the last IP from attachedENI: %+v"}
{"level":"warn","ts":"2020-08-31T21:29:50.352Z","caller":"ipamd/ipamd.go:968","msg":"Instance metadata does not match data store! ipPool: [192.168.145.78 192.168.145.183 192.168.133.18 192.168.155.72 192.168.143.232 192.168.132.174 192.168.145.239 192.168.133.98 192.168.157.218 192.168.143.209 192.168.153.245 192.168.139.151 192.168.150.195 192.168.150.150], metadata: [{\n Primary: true,\n PrivateIpAddress: \"192.168.153.162\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.155.72\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.143.232\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.132.174\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.145.78\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.145.239\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.133.98\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.150.195\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.157.218\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.143.209\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.133.18\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.153.245\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.150.150\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.145.183\"\n}]"}
{"level":"debug","ts":"2020-08-31T21:29:50.352Z","caller":"ipamd/ipamd.go:968","msg":"We need to check the ENI status by calling the EC2 control plane."}
{"level":"debug","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:1026","msg":"Reconcile and skip primary IP 192.168.153.162 on ENI eni-074c28aba8340f164"}
{"level":"debug","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:476","msg":"Reconcile and add a new ENI {eni-0c57b7eecd642377d 06:ec:7b:a5:81:5f %!s(int=3) 192.168.128.0/19 [{\n Primary: true,\n PrivateIpAddress: \"192.168.159.161\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.144.13\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.154.110\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.147.175\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.138.96\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.145.34\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.146.227\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.155.154\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.137.157\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.149.94\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.142.62\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.129.31\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.149.50\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.178\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.140.51\"\n}]}"}
{"level":"debug","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:742","msg":"DataStore Add an ENI eni-0c57b7eecd642377d"}
{"level":"info","ts":"2020-08-31T21:29:50.453Z","caller":"networkutils/network.go:704","msg":"Setting up network for an ENI with IP address 192.168.159.161, MAC address 06:ec:7b:a5:81:5f, CIDR 192.168.128.0/19 and route table 3"}
{"level":"debug","ts":"2020-08-31T21:29:50.453Z","caller":"networkutils/network.go:704","msg":"XXXX Failing randomly!"}
{"level":"info","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:754","msg":"RemoveENIFromDataStore eni-0c57b7eecd642377d: IP address pool stats: free 0 addresses, total: 28, assigned: 17"}
{"level":"error","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:476","msg":"IP pool reconcile: Failed to set up ENI eni-0c57b7eecd642377d network: failed to set up ENI eni-0c57b7eecd642377d network: random linkByMac failure injected"}
{"level":"debug","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:476","msg":"Successfully Reconciled ENI/IP pool"}
{"level":"debug","ts":"2020-08-31T21:29:50.453Z","caller":"ipamd/ipamd.go:476","msg":"IP Address Pool stats: total: 28, assigned: 17"}
log.Errorf("Failed to fetch ENI IP addresses! Aborting reconcile of ENI %s", eni) | ||
return | ||
} | ||
attachedENIIPs = ec2Addresses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the IP is just got allocated? It will be assigned to attachedENIIPs and might get deleted right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the secondary IP just got added outside of IPAMD, so it's not yet seen in the ipPool
, but shows up in IMDS metadata. Then we will also call EC2 to verify, then this reconcile function will add it when looping over attachedENIIPs
:
c.dataStore.AddIPv4AddressToStore(eni, strPrivateIPv4)
log.Warnf("Instance metadata does not match data store! ipPool: %v, metadata: %v", ipPool, attachedENIIPs) | ||
log.Debugf("We need to check the ENI status by calling the EC2 control plane.") | ||
// Call EC2 to verify IPs on this ENI | ||
ec2Addresses, err := c.awsClient.GetIPv4sFromEC2(eni) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we planning to testing this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a unit test step:
{"level":"debug","ts":"2020-08-29T19:52:32.526-0700","caller":"ipamd/ipamd_test.go:669","msg":"IP Address Pool stats: total: 2, assigned: 0"}
{"level":"debug","ts":"2020-08-29T19:52:32.526-0700","caller":"ipamd/ipamd_test.go:689","msg":"Reconciling ENI/IP pool info because time since last 81.451µs <= 0s"}
{"level":"debug","ts":"2020-08-29T19:52:32.526-0700","caller":"ipamd/ipamd_test.go:689","msg":"Reconcile existing ENI eni-00000000 IP pool"}
{"level":"warn","ts":"2020-08-29T19:52:32.526-0700","caller":"ipamd/ipamd.go:963","msg":"Instance metadata does not match data store! ipPool: [10.10.10.12 10.10.10.13], metadata: [{\n Primary: true,\n PrivateIpAddress: \"10.10.10.11\"\n}]"}
{"level":"debug","ts":"2020-08-29T19:52:32.526-0700","caller":"ipamd/ipamd.go:963","msg":"We need to check the ENI status by calling the EC2 control plane."}
{"level":"error","ts":"2020-08-29T19:52:32.526-0700","caller":"ipamd/ipamd.go:963","msg":"Failed to fetch ENI IP addresses! Aborting reconcile of ENI eni-00000000"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asking if we can manually test the change once ? (Could be done by adding some sleep and triggering the code path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a temporary change where we just delete the last IP returned by IMDS should work to verify this code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug build with the following change:
log.Debugf("Reconcile existing ENI %s IP pool", attachedENI.ENIID)
badIMDS := attachedENI
addressCount := len(badIMDS.IPv4Addresses)
if addressCount > 1 {
badIMDS.IPv4Addresses = badIMDS.IPv4Addresses[:addressCount-1]
}
log.Debugf("XXXXX Deleted the last IP from attachedENI: %+v")
// Reconcile IP pool
c.eniIPPoolReconcile(eniIPPool, badIMDS)
{"level":"debug","ts":"2020-08-31T21:25:39.922Z","caller":"ipamd/ipamd.go:476","msg":"Reconciling ENI/IP pool info because time since last 1m0.127726217s <= 1m0s"}
{"level":"debug","ts":"2020-08-31T21:25:39.923Z","caller":"ipamd/ipamd.go:908","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2020-08-31T21:25:39.923Z","caller":"awsutils/awsutils.go:568","msg":"Found ENI MAC address: 06:66:05:b5:c1:71"}
{"level":"debug","ts":"2020-08-31T21:25:39.924Z","caller":"awsutils/awsutils.go:582","msg":"Using device number 0 for primary eni: eni-01d689e8a83c6feba"}
{"level":"debug","ts":"2020-08-31T21:25:39.924Z","caller":"awsutils/awsutils.go:568","msg":"Found ENI: eni-01d689e8a83c6feba, MAC 06:66:05:b5:c1:71, device 0"}
{"level":"debug","ts":"2020-08-31T21:25:39.924Z","caller":"awsutils/awsutils.go:588","msg":"Found CIDR 192.168.128.0/19 for ENI 06:66:05:b5:c1:71"}
{"level":"debug","ts":"2020-08-31T21:25:39.924Z","caller":"awsutils/awsutils.go:588","msg":"Found IP addresses [192.168.131.160 192.168.149.75 192.168.134.205 192.168.128.237 192.168.150.15 192.168.152.224 192.168.134.35 192.168.129.132 192.168.156.197 192.168.131.24 192.168.134.155 192.168.147.91 192.168.159.31 192.168.133.50 192.168.152.119] on ENI 06:66:05:b5:c1:71"}
{"level":"debug","ts":"2020-08-31T21:25:39.924Z","caller":"awsutils/awsutils.go:568","msg":"Found ENI MAC address: 06:8f:fe:97:a2:9b"}
{"level":"debug","ts":"2020-08-31T21:25:39.925Z","caller":"awsutils/awsutils.go:568","msg":"Found ENI: eni-074c28aba8340f164, MAC 06:8f:fe:97:a2:9b, device 2"}
{"level":"debug","ts":"2020-08-31T21:25:39.926Z","caller":"awsutils/awsutils.go:588","msg":"Found CIDR 192.168.128.0/19 for ENI 06:8f:fe:97:a2:9b"}
{"level":"debug","ts":"2020-08-31T21:25:39.926Z","caller":"awsutils/awsutils.go:588","msg":"Found IP addresses [192.168.153.162 192.168.155.72 192.168.143.232 192.168.132.174 192.168.145.78 192.168.145.239 192.168.133.98 192.168.150.195 192.168.157.218 192.168.143.209 192.168.133.18 192.168.153.245 192.168.150.150 192.168.145.183 192.168.139.151] on ENI 06:8f:fe:97:a2:9b"}
{"level":"debug","ts":"2020-08-31T21:25:39.926Z","caller":"ipamd/ipamd.go:476","msg":"Reconcile existing ENI eni-01d689e8a83c6feba IP pool"}
{"level":"debug","ts":"2020-08-31T21:25:39.926Z","caller":"ipamd/ipamd.go:476","msg":"XXXXX Deleted the last IP from attachedENI: %+v"}
{"level":"warn","ts":"2020-08-31T21:25:39.926Z","caller":"ipamd/ipamd.go:968","msg":"Instance metadata does not match data store! ipPool: [192.168.134.35 192.168.129.132 192.168.134.155 192.168.159.31 192.168.134.205 192.168.152.224 192.168.150.15 192.168.147.91 192.168.149.75 192.168.128.237 192.168.133.50 192.168.156.197 192.168.131.24 192.168.152.119], metadata: [{\n Primary: true,\n PrivateIpAddress: \"192.168.131.160\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.149.75\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.205\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.128.237\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.150.15\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.152.224\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.35\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.129.132\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.156.197\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.131.24\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.134.155\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.147.91\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.159.31\"\n} {\n Primary: false,\n PrivateIpAddress: \"192.168.133.50\"\n}]"}
{"level":"debug","ts":"2020-08-31T21:25:39.926Z","caller":"ipamd/ipamd.go:968","msg":"We need to check the ENI status by calling the EC2 control plane."}
{"level":"debug","ts":"2020-08-31T21:25:39.993Z","caller":"ipamd/ipamd.go:1026","msg":"Reconcile and skip primary IP 192.168.131.160 on ENI eni-01d689e8a83c6feba"}
{"level":"debug","ts":"2020-08-31T21:25:39.993Z","caller":"ipamd/ipamd.go:476","msg":"Reconcile existing ENI eni-074c28aba8340f164 IP pool"}
4d632b0
to
ad210ef
Compare
ad210ef
to
7ddba73
Compare
7ddba73
to
5432838
Compare
@@ -635,3 +637,182 @@ func TestPodENIConfigFlag(t *testing.T) { | |||
disabled = enablePodENI() | |||
assert.False(t, disabled) | |||
} | |||
|
|||
func TestNodeIPPoolReconcileBadIMDSData(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
5432838
to
0c18888
Compare
0c18888
to
42e552e
Compare
} | ||
|
||
// Add all known attached IPs to the datastore | ||
seenIPs := c.verifyAndAddIPsToDatastore(eni, attachedENIIPs, needEC2Reconcile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have given different name for this - ipFromEC2, ownedByEC2
// from the EC2 instance outside of the control of ipamd. If this happens, there's nothing | ||
// we can do other than force all pods to be unassigned from the IPs on this eni. | ||
// This scenario can occur if the reconciliation process discovered this ENI was detached | ||
// from the EC2 instance outside of the control of ipamd. If this happens, there's nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
French spacing? That's quite a controversial typographical decision ... :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issues: #1070, #1094
Description of changes:
If the ENI metadata returned from IMDS does not match with what we have in the datastore, we need to verify with EC2 API.
SetupENINetwork()
, always remove the ENI we just tried to add from the datastoreSuccessful E2E tests: https://app.circleci.com/pipelines/github/mogren/amazon-vpc-cni-k8s/812/workflows/673a9bc3-0122-4972-b276-dec44ec8f546/jobs/1639
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.