CNI does not wait for ENI and IP to be available before reporting ready when setting MINIMUM_IP_TARGET and WARM_IP_TARGET #1724

dtnyn · 2021-11-04T22:55:35Z

What happened:

After setting MINIMUM_IP_TARGET and WARM_IP_TARGET. When scheduled on a new node, pods are stuck on failed to assign an IP address to container for a few minutes before being able to be created.

From my observation I can see that after setting these override, aws-node does not wait for its datastore to be populated first before reporting ready causing pods to try to be scheduled onto these nodes and goes into a crashing state as no IP can be assigned

Pod log

Normal   SuccessfulAttachVolume  3m16s              attachdetach-controller  AttachVolume.Attach succeeded for volume "XXXXXXXX"
  Warning  FailedCreatePodSandBox  3m12s              kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "22cbd7067d76c2ed509436f65384e1df59ec8949e7910d7bbbfb32de8a511f89": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  3m                 kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "81a9fde8b5ea0da28959e8c08e08dc2ffcbe700c7f3067afd273e9ac67b37588": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m48s              kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a94f47242b45d3610a9bf25ee17de09d8fb6f77dcf1754deac13395020b1082f": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m37s              kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a6f2f67496e2c75cdc89963afc5d4f0b47fe3d6988d0356b4cd0e0ffb641400c": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m24s              kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "de2c20851923cc290e187ade06472c2f77595c5d55f7ee7522603fb55e9ee943": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  2m10s              kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4f3e15dbff1e9f6e2ad7055a61b75724d9ccd60850e05fe811966ca1cae65529": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  115s               kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a545894e2066ed715d5b3667c38bf9f6ecefd45c67e79da2e8016929e84c66a0": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  100s               kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9e76dd6f08280e3f246e1f366b8b0170abc345e814754797cea56e523e2f14c1": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  88s                kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5c6a3409855d54fd5f296e0793eafc84202a4e825a2cea86784b54bfb3220231": add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox  75s                kubelet                  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0be78bc1e43e44a0f14a503c9cfbb0fdd9570d5aab0a6db9244e129c3ef294a9": add cmd: failed to assign an IP address to container
  Normal   Pulling                 62s                kubelet                  Pulling image "XXXXXXXXXXX"

I have been monitoring the ready state for the pod and see that it's declared ready before ipamd is able to successfully add the ENI and its IP to the datastore

The pod status change

Thu Nov  4 11:43:31 AEDT 2021
aws-node-xnpcf                                                          0/1   Running             0     69s   10.1.56.83    ip-10-1-56-83.ap-southeast-2.compute.internal    <none>   <none>
Thu Nov  4 11:43:32 AEDT 2021
aws-node-xnpcf                                                          0/1   Running             0     70s   10.1.56.83    ip-10-1-56-83.ap-southeast-2.compute.internal    <none>   <none>
Thu Nov  4 11:43:33 AEDT 2021
aws-node-xnpcf                                                          1/1   Running             0     72s   10.1.56.83    ip-10-1-56-83.ap-southeast-2.compute.internal    <none>   <none>
Thu Nov  4 11:43:35 AEDT 2021
aws-node-xnpcf                                                          1/1   Running             0     73s   10.1.56.83    ip-10-1-56-83.ap-southeast-2.compute.internal    <none>   <none>

Readiness

date; k describe po aws-node-xnpcf | ag Ready:
Thu Nov  4 11:43:43 AEDT 2021
    Ready:          True
    Ready:          True

The node status change

ip-10-1-56-83.ap-southeast-2.compute.internal   NotReady   <none>   38s   v1.20.11
Thu Nov  4 11:43:01 AEDT 2021
ip-10-1-56-83.ap-southeast-2.compute.internal   NotReady   <none>   39s   v1.20.11
Thu Nov  4 11:43:02 AEDT 2021
ip-10-1-56-83.ap-southeast-2.compute.internal   Ready   <none>   41s   v1.20.11
Thu Nov  4 11:43:03 AEDT 2021
ip-10-1-56-83.ap-southeast-2.compute.internal   Ready   <none>   42s   v1.20.11

Inside ipamd around that time when the pod and node are declared ready I can see pods are trying to be scheduled and assigned an IP but there's nothing in the datastore at the time

{"level":"info","ts":"2021-11-04T00:43:35.680Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-42c1a3c7-8c69-e278-754d-efb63b6b33ea, Sandbox b624815ee518e511c3c99bdc8e66dfd1c996fe32986fadf90d9bd5235594e901, ifname eth0"}
{"level":"debug","ts":"2021-11-04T00:43:35.680Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"b624815ee518e511c3c99bdc8e66dfd1c996fe32986fadf90d9bd5235594e901\" ContainerID:\"b624815ee518e511c3c99bdc8e66dfd1c996fe32986fadf90d9bd5235594e901\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-42c1a3c7-8c69-e278-754d-efb63b6b33ea\" "}
{"level":"debug","ts":"2021-11-04T00:43:35.680Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 0, assigned 0"}
{"level":"debug","ts":"2021-11-04T00:43:35.680Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignPodIPv4Address: ENI eni-0f7e7018aaead6bc1 does not have available addresses"}
{"level":"error","ts":"2021-11-04T00:43:35.680Z","caller":"ipamd/rpc_handler.go:141","msg":"DataStore has no available IP/Prefix addresses"}

It is not until a while later that ipamd successfully populate the datastore

{"level":"info","ts":"2021-11-04T00:45:49.425Z","caller":"ipamd/ipamd.go:917","msg":"Added ENI(eni-06129a67d2970e562)'s IP/Prefix 10.7.31.56/32 to datastore"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:904","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, totalprefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:905","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, totalprefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-04T00:45:49.426Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}

During this time period between aws-node reporting as ready and when ipamd datastore is actually populated with IPs, pods are being scheduled on and failing which is undesirable.

What you expected to happen:

I expected a similar behaviour observed with the default WARM_ENI_TARGET=1 behaviour where the pod is not considered to be ready until the datastore adds the attached ENI is populated with IPs.

The following logs are from using the default option with no MINIMUM_IP_TARGET and WARM_IP_TARGET override. In this case aws-cni works as expected

Pod status

Thu Nov  4 15:55:28 AEDT 2021
aws-node-7xwp2                                                          0/1   Running             0     69s     10.1.56.123   ip-10-1-56-123.ap-southeast-2.compute.internal   <none>   <none>
Thu Nov  4 15:55:29 AEDT 2021
aws-node-7xwp2                                                          1/1   Running             0     70s     10.1.56.123   ip-10-1-56-123.ap-southeast-2.compute.internal   <none>   <none>
Thu Nov  4 15:55:31 AEDT 2021
aws-node-7xwp2                                                          1/1   Running             0     72s     10.1.56.123   ip-10-1-56-123.ap-southeast-2.compute.internal   <none>   <none>

Ready status

Thu Nov  4 15:55:26 AEDT 2021
    Ready:          True
    Ready:          False
Thu Nov  4 15:55:27 AEDT 2021
    Ready:          True
    Ready:          False
Thu Nov  4 15:55:29 AEDT 2021
    Ready:          True
    Ready:          True

I can see in the ipamd log that the datastore successfully populate itself with the ENI's IP around this time

{"level":"info","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:917","msg":"Added ENI(eni-035e6725b4d13e122)'s IP/Prefix 10.7.30.221/32 to datastore"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:904","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, total prefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:905","msg":"Datastore Pool stats: total(/32): 14, assigned(/32): 0, total prefixes(/28): 0"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-04T04:55:02.623Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}

Environment:

Kubernetes version (use kubectl version): v1.20.11
CNI Version: v1.9.0
OS (e.g: cat /etc/os-release): Ubuntu 20.04.3 LTS (Focal Fossa)
Kernel (e.g. uname -a): Linux xxxxxxxx 5.11.0-1019-aws #20~20.04.1-Ubuntu SMP Tue Sep 21 10:40:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

jayanthvn · 2021-11-05T02:40:14Z

@dtnyn - What is your WARM_IP_TARGET and MINIMUM_IP_TARGET values? Do you have custom networking enabled?

With respect to WARM_IP_TARGET/MINIMUM_IP_TARGET and WARM_ENI_TARGET, there is no difference on how IPAMD starts up. We do wait for ipamd to start - https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.9/scripts/entrypoint.sh#L139 before copying cni config file - https://github.com/aws/amazon-vpc-cni-k8s/blob/release-1.9/scripts/entrypoint.sh#L145 and in ipamd init - we do try to assign IPs to the primary ENI - https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.9.0/pkg/ipamd/ipamd.go#L493. So before node ready typically we should have primary ENI with ips in non-custom networking mode.

Once you share the configs, we can try to verify this scenario locally.

dtnyn · 2021-11-05T02:53:31Z

These are the config enabled for the CNI

Liveness:   exec [/app/grpc-health-probe -addr=:50051] delay=35s timeout=1s period=10s
Readiness:  exec [/app/grpc-health-probe -addr=:50051] delay=35s timeout=1s period=10s
Environment:
  AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG:  true
  AWS_VPC_K8S_CNI_EXTERNALSNAT:        true
  ENI_CONFIG_LABEL_DEF:                awscni
  AWS_VPC_K8S_CNI_VETHPREFIX:          eni
  AWS_VPC_K8S_CNI_LOGLEVEL:            DEBUG
  AWS_VPC_K8S_PLUGIN_LOG_LEVEL:        DEBUG
  WARM_IP_TARGET:                      5
  MINIMUM_IP_TARGET:                   25
  MY_NODE_NAME:                         (v1:spec.nodeName)
  WATCH_NAMESPACE:                     xxxxxx (v1:metadata.namespace)

This is used on both m5.xlarge and m5a.xlarge both of which the difference behaviour between overriding WARM_ENI_TARGET can be observed.

jayanthvn · 2021-11-05T05:52:56Z

Thanks for sharing the details. Since you mentioned this is a new node, the instance will have only the primary ENI attached and during aws-node initialization IPs won't be assigned to the primary ENI since custom networking is enabled. The secondary ENI will be allocated by the reconciler based on the WARM targets requirement since available IPs will be 0. Before the reconciler thread starts node can be marked as ready and datastore will not have any IPs. But we will try both the configs locally and get back to you.

dtnyn · 2021-11-11T06:28:01Z

Maybe a better description of the problem would be the difference between the init time and the time when the datastore is first populated with. We're seeing an average wait of up to 3 minutes when WARM_IP_TARGET override is enabled compared to the average 30-60 seconds of the default WARM_ENI_TARGET=1 on m5.xlarge

During that 3 waits minutes pods are stuck pending for an extended period of time since there's no IP to be allocated to them. I do sometimes see pods waiting on the default configuration with WARM_ENI_TARGET but since the instantiation time is so fast it's very much negligible

Examples:

`WARM_IP_TARGET=5` `MINIMUM_IP_TARGET=25`

Node 1:

{"level":"info","ts":"2021-11-11T05:40:03.932Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0  ..."}
{"level":"info","ts":"2021-11-11T05:40:03.934Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

{"level":"debug","ts":"2021-11-11T05:43:04.337Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T05:43:04.337Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"debug","ts":"2021-11-11T05:43:04.337Z","caller":"ipamd/ipamd.go:1039","msg":"Current warm IP stats: target: 5, total: 14, assigned: 0, available: 14, short: 11, over 0"}

Node 2:

{"level":"info","ts":"2021-11-11T05:40:02.882Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0  ..."}
{"level":"info","ts":"2021-11-11T05:40:02.883Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

{"level":"debug","ts":"2021-11-11T05:43:06.767Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T05:43:06.767Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"debug","ts":"2021-11-11T05:43:06.767Z","caller":"ipamd/ipamd.go:1039","msg":"Current warm IP stats: target: 5, total: 14, assigned: 0, available: 14, short: 11, over 0"}

Node 3:

{"level":"info","ts":"2021-11-11T05:40:03.974Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0  ..."}
{"level":"info","ts":"2021-11-11T05:40:03.975Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

{"level":"debug","ts":"2021-11-11T05:43:05.236Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T05:43:05.236Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"debug","ts":"2021-11-11T05:43:05.236Z","caller":"ipamd/ipamd.go:1039","msg":"Current warm IP stats: target: 5, total: 14, assigned: 0, available: 14, short: 11, over 0"}

`WARM_ENI_TARGET=1` (Default)

Node 1:

{"level":"info","ts":"2021-11-11T06:12:10.002Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0  ..."}
{"level":"info","ts":"2021-11-11T06:12:10.003Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

{"level":"debug","ts":"2021-11-11T06:12:21.139Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T06:12:21.139Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"info","ts":"2021-11-11T06:12:22.510Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-4b0995b0-dbd8-f4f9-1e56-14919ad2318a, Sandbox 82ea78f03a231293bf67c0d28b7219378089f644c4d8e69ac052c21dc2f942a7, ifname eth0"}
{"level":"debug","ts":"2021-11-11T06:12:22.511Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"82ea78f03a231293bf67c0d28b7219378089f644c4d8e69ac052c21dc2f942a7\" ContainerID:\"82ea78f03a231293bf67c0d28b7219378089f644c4d8e69ac052c21dc2f942a7\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-4b0995b0-dbd8-f4f9-1e56-14919ad2318a\" "}
{"level":"debug","ts":"2021-11-11T06:12:22.511Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 0"}

Node 2:

{"level":"info","ts":"2021-11-11T06:12:09.881Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0  ..."}
{"level":"info","ts":"2021-11-11T06:12:09.882Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

{"level":"debug","ts":"2021-11-11T06:12:20.717Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T06:12:20.717Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"info","ts":"2021-11-11T06:12:22.210Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-57a72a54-6f0e-7d8a-c423-bc826298d8fa, Sandbox 330d515564bd9e0382c8b786449771e67b27bb58fccb7030f5ed5b33ca55edb9, ifname eth0"}
{"level":"debug","ts":"2021-11-11T06:12:22.210Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"330d515564bd9e0382c8b786449771e67b27bb58fccb7030f5ed5b33ca55edb9\" ContainerID:\"330d515564bd9e0382c8b786449771e67b27bb58fccb7030f5ed5b33ca55edb9\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-57a72a54-6f0e-7d8a-c423-bc826298d8fa\" "}
{"level":"debug","ts":"2021-11-11T06:12:22.210Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 0"}

Node 3:

{"level":"info","ts":"2021-11-11T06:12:10.108Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.0  ..."}
{"level":"info","ts":"2021-11-11T06:12:10.109Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

{"level":"debug","ts":"2021-11-11T06:12:24.631Z","caller":"ipamd/ipamd.go:706","msg":"Successfully increased IP pool, total: 14, used: 0"}
{"level":"debug","ts":"2021-11-11T06:12:24.631Z","caller":"ipamd/ipamd.go:723","msg":"IP pool stats: total = 14, used = 0, c.maxIPsPerENI = 14"}
{"level":"info","ts":"2021-11-11T06:12:34.864Z","caller":"rpc/rpc.pb.go:501","msg":"Received AddNetwork for NS /var/run/netns/cni-285be1d5-8ad2-7a0f-29f3-0d1c91a6efb0, Sandbox 1ece564a1c9ea4e7c311950efda30949538847f30a98eabf90fa3651c3b6adfd, ifname eth0"}
{"level":"debug","ts":"2021-11-11T06:12:34.864Z","caller":"rpc/rpc.pb.go:501","msg":"AddNetworkRequest: ClientVersion:\"v1.9.0\" K8S_POD_NAME:\"xxxxxxx\" K8S_POD_NAMESPACE:\"xxxxxxx\" K8S_POD_INFRA_CONTAINER_ID:\"1ece564a1c9ea4e7c311950efda30949538847f30a98eabf90fa3651c3b6adfd\" ContainerID:\"1ece564a1c9ea4e7c311950efda30949538847f30a98eabf90fa3651c3b6adfd\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-285be1d5-8ad2-7a0f-29f3-0d1c91a6efb0\" "}
{"level":"debug","ts":"2021-11-11T06:12:34.864Z","caller":"ipamd/rpc_handler.go:141","msg":"AssignIPv4Address: IP address pool stats: total: 14, assigned 0"}

sidewinder12s · 2022-01-26T00:39:47Z

@dtnyn Have you given the new IP prefix's a try/do they have similar delay? I expect I might be experiencing similar delays as we have the same type of config in our clusters.

jayanthvn · 2022-01-26T07:14:19Z

I did a test on my local cluster to compare the time difference between init time and when datastore has IPs and didn't find any difference with respect to the configs -

WARM_ENI_TARGET = 1

Init start ->

{"level":"info","ts":"2022-01-26T06:30:00.329Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.3  ..."}

Reconciler increasing the pool and added a secondary ENI ->

{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:628","msg":"IP pool is too low: available (0) < ENI target (1) * addrsPerENI (3)"}
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:629","msg":"Starting to increase pool size"}
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:899","msg":"Skip the primary ENI for need IP check"}
{"level":"info","ts":"2022-01-26T06:30:05.052Z","caller":"eniconfig/eniconfig.go:68","msg":"Get Node Info for: ip-192-168-30-100.us-west-2.compute.internal"}
{"level":"debug","ts":"2022-01-26T06:30:05.052Z","caller":"eniconfig/eniconfig.go:129","msg":"Using ENI_CONFIG_LABEL_DEF failure-domain.beta.kubernetes.io/zone"}
{"level":"info","ts":"2022-01-26T06:30:05.052Z","caller":"ipamd/ipamd.go:806","msg":"Found ENI Config Name: us-west-2a"}
{"level":"info","ts":"2022-01-26T06:30:05.153Z","caller":"ipamd/ipamd.go:780","msg":"ipamd: using custom network config: [], subnet-0eeb2ea845de1c8d3"}
{"level":"info","ts":"2022-01-26T06:30:05.153Z","caller":"awsutils/awsutils.go:689","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2022-01-26T06:30:05.153Z","caller":"awsutils/awsutils.go:689","msg":"No custom networking security group found, will use the node's primary ENI's SG: [%!s(*string=0xc0006cd760) %!s(*string=0xc0006cd770)]"}
{"level":"info","ts":"2022-01-26T06:30:05.153Z","caller":"awsutils/awsutils.go:689","msg":"Creating ENI with security groups: [sg-04123967cfc7fea84 sg-0c72b30626f1cbaf3] in subnet: subnet-0eeb2ea845de1c8d3"}
{"level":"info","ts":"2022-01-26T06:30:05.608Z","caller":"awsutils/awsutils.go:689","msg":"Created a new ENI: eni-07c415ca2e05f9b9d"}
{"level":"debug","ts":"2022-01-26T06:30:05.709Z","caller":"awsutils/awsutils.go:732","msg":"Discovered device number is used: 0"}
{"level":"debug","ts":"2022-01-26T06:30:05.709Z","caller":"awsutils/awsutils.go:732","msg":"Found a free device number: 1"}
{"level":"info","ts":"2022-01-26T06:30:07.040Z","caller":"ipamd/ipamd.go:821","msg":"Successfully created and attached a new ENI eni-07c415ca2e05f9b9d to instance"}
{"level":"info","ts":"2022-01-26T06:30:07.040Z","caller":"ipamd/ipamd.go:830","msg":"Trying to allocate 3 IP addresses on ENI eni-07c415ca2e05f9b9d"}
{"level":"debug","ts":"2022-01-26T06:30:07.040Z","caller":"ipamd/ipamd.go:830","msg":"PD enabled - false"}
{"level":"info","ts":"2022-01-26T06:30:07.535Z","caller":"ipamd/ipamd.go:830","msg":"Allocated 3 private IP addresses"}

WARM_IP_TARGET = 1 and MINIMUM_IP_TARGET = 3

Init start ->

{"level":"info","ts":"2022-01-26T06:57:17.253Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.9.3  ..."}

Reconciler increasing the pool and added a secondary ENI ->

{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:629","msg":"Starting to increase pool size"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:737","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:862","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:886","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.208Z","caller":"ipamd/ipamd.go:899","msg":"Skip the primary ENI for need IP check"}
{"level":"info","ts":"2022-01-26T06:57:21.208Z","caller":"eniconfig/eniconfig.go:68","msg":"Get Node Info for: ip-192-168-33-140.us-west-2.compute.internal"}
{"level":"debug","ts":"2022-01-26T06:57:21.209Z","caller":"eniconfig/eniconfig.go:129","msg":"Using ENI_CONFIG_LABEL_DEF failure-domain.beta.kubernetes.io/zone"}
{"level":"info","ts":"2022-01-26T06:57:21.209Z","caller":"ipamd/ipamd.go:806","msg":"Found ENI Config Name: us-west-2b"}
{"level":"info","ts":"2022-01-26T06:57:21.309Z","caller":"ipamd/ipamd.go:780","msg":"ipamd: using custom network config: [], subnet-0747217e1d86e6510"}
{"level":"info","ts":"2022-01-26T06:57:21.309Z","caller":"awsutils/awsutils.go:689","msg":"Using a custom network config for the new ENI"}
{"level":"warn","ts":"2022-01-26T06:57:21.309Z","caller":"awsutils/awsutils.go:689","msg":"No custom networking security group found, will use the node's primary ENI's SG: [%!s(*string=0xc00024c5e0) %!s(*string=0xc00024c5f0)]"}
{"level":"info","ts":"2022-01-26T06:57:21.309Z","caller":"awsutils/awsutils.go:689","msg":"Creating ENI with security groups: [sg-04123967cfc7fea84 sg-0c72b30626f1cbaf3] in subnet: subnet-0747217e1d86e6510"}
{"level":"info","ts":"2022-01-26T06:57:21.723Z","caller":"awsutils/awsutils.go:689","msg":"Created a new ENI: eni-052e27ef0393de5ca"}
{"level":"debug","ts":"2022-01-26T06:57:21.824Z","caller":"awsutils/awsutils.go:732","msg":"Discovered device number is used: 0"}
{"level":"debug","ts":"2022-01-26T06:57:21.824Z","caller":"awsutils/awsutils.go:732","msg":"Found a free device number: 1"}
{"level":"info","ts":"2022-01-26T06:57:23.135Z","caller":"ipamd/ipamd.go:821","msg":"Successfully created and attached a new ENI eni-052e27ef0393de5ca to instance"}
{"level":"debug","ts":"2022-01-26T06:57:23.136Z","caller":"ipamd/ipamd.go:1899","msg":"Current warm IP stats: target: 1, total: 0, assigned: 0, available: 0, short: 3, over 0"}
{"level":"info","ts":"2022-01-26T06:57:23.136Z","caller":"ipamd/ipamd.go:830","msg":"Trying to allocate 3 IP addresses on ENI eni-052e27ef0393de5ca"}
{"level":"debug","ts":"2022-01-26T06:57:23.136Z","caller":"ipamd/ipamd.go:830","msg":"PD enabled - false"}
{"level":"info","ts":"2022-01-26T06:57:23.554Z","caller":"ipamd/ipamd.go:830","msg":"Allocated 3 private IP addresses"}

Both the cases within 4-5 seconds reconciler tried to increase the pool. There is no difference in the code path for both the cases. Once the reconciler starts based on the WARM targets pool will be increased.

But I feel there is one optimization that can be done here, since you are using custom networking and primary ENI cannot have IPs/Prefixes attached we should go ahead and attach secondary ENI based on the WARM targets and that should help with the pod launch time otherwise we will have to wait for the reconciler to increase the pool and node will be marked as ready.

dtnyn · 2022-01-27T03:33:13Z

@sidewinder12s We have tried prefix mode but did not do extended testing due to the minimum allocation size of /28 which doesn't fit our use-case.

@jayanthvn I believe that your proposed optimization, attaching a new ENI with pre-allocated IP, will best for our current problem. Since we can observe that the ENI attachment and the time starting the reconcile are very fast. Only the process of the initial reconiller adding new IP to the ENI that we see is very slow. Any subsequent reconciller processes on the the same ENI is fast.

jayanthvn · 2022-01-27T03:36:19Z

@dtnyn - Yes that's right. I have the PR and testing it out locally.

github-actions · 2022-04-29T00:15:04Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

CameronHall · 2022-05-05T12:55:53Z

@jayanthvn we're experiencing this issue as well. How has your local testing been going?

jayanthvn · 2022-05-05T23:04:29Z

I am waiting for the code review.

leofernandezg · 2022-07-14T20:50:59Z

+1

sidewinder12s · 2022-09-12T20:15:50Z

Any updates on PR Progress?

jayanthvn · 2022-09-12T20:39:32Z

We had discussed few changes for the PR but couldn't progress because of other commitments. I will take it up next week for changes/review and track it for 1.11.5 or 1.12.0 release. Will provide the release timelines once the PR is merged.

github-actions · 2022-11-12T00:03:37Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

sidewinder12s · 2022-11-14T15:28:51Z

Not stale. We still regularly see this issue.

github-actions · 2023-01-15T00:03:04Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

sidewinder12s · 2023-01-16T18:36:06Z

Not stale

github-actions · 2023-03-19T00:03:28Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

sidewinder12s · 2023-03-20T18:25:18Z

Not stale.

pmvrak · 2023-03-21T17:43:43Z

I faced the same issue while creating EKS cluster with custom networking with prefix delegation and I got below error -

{"level":"error","ts":"2023-03-21T16:32:26.571Z","caller":"ipamd/ipamd.go:845","msg":"Failed to increase pool size due to not able to allocate ENI AllocENI:error attaching ENI: attachENI:

failed to attach ENI: InvalidParameterCombination: You may not attach a network interface to an instance if they are not in the same availability zone\n\tstatus code: 400, request id: 8160c309-fa43-4b0f-95f2-45b3b5100ed7"}

{"level":"debug","ts":"2023-03-21T17:26:17.714Z","caller":"datastore/data_store.go:646","msg":"AssignIPv4Address: IP address pool stats: total: 0, assigned 0"}

{"level":"debug","ts":"2023-03-21T17:26:17.714Z","caller":"datastore/data_store.go:646","msg":"AssignPodIPv4Address: ENI eni-0b881d8fd9bae3c92 does not haveavailable addresses"}

I followed docs here and here and I have available Ip's in the subnets and the parameters WARM_IP_TARGET and AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG are set properly.

when I have /16 subnets , it works fine it seems and when I have primary node subnets with /24 and secondary ICDR subnets with /18 , then I get this issue.

jdn5126 · 2023-05-12T19:11:39Z

Closing as fixed by #2354. This will ship in the next VPC CNI release

github-actions · 2023-05-12T19:12:03Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

dtnyn added the bug label Nov 4, 2021

achevuru assigned jayanthvn Nov 16, 2021

jayanthvn mentioned this issue Jan 27, 2022

Custom networking optimization #1827

Closed

github-actions bot added the stale Issue or PR is stale label Apr 29, 2022

github-actions bot removed the stale Issue or PR is stale label May 6, 2022

github-actions bot added the stale Issue or PR is stale label Nov 12, 2022

github-actions bot removed the stale Issue or PR is stale label Nov 15, 2022

github-actions bot added the stale Issue or PR is stale label Jan 15, 2023

github-actions bot removed the stale Issue or PR is stale label Jan 17, 2023

github-actions bot added the stale Issue or PR is stale label Mar 19, 2023

github-actions bot removed the stale Issue or PR is stale label Mar 21, 2023

jdn5126 mentioned this issue Apr 19, 2023

Increase datastore pool at startup #2354

Merged

jdn5126 closed this as completed May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNI does not wait for ENI and IP to be available before reporting ready when setting MINIMUM_IP_TARGET and WARM_IP_TARGET #1724

CNI does not wait for ENI and IP to be available before reporting ready when setting MINIMUM_IP_TARGET and WARM_IP_TARGET #1724

dtnyn commented Nov 4, 2021 •

edited

Loading

jayanthvn commented Nov 5, 2021 •

edited

Loading

dtnyn commented Nov 5, 2021 •

edited

Loading

jayanthvn commented Nov 5, 2021

dtnyn commented Nov 11, 2021 •

edited

Loading

sidewinder12s commented Jan 26, 2022

jayanthvn commented Jan 26, 2022

dtnyn commented Jan 27, 2022

jayanthvn commented Jan 27, 2022

github-actions bot commented Apr 29, 2022

CameronHall commented May 5, 2022

jayanthvn commented May 5, 2022

leofernandezg commented Jul 14, 2022

sidewinder12s commented Sep 12, 2022

jayanthvn commented Sep 12, 2022

github-actions bot commented Nov 12, 2022

sidewinder12s commented Nov 14, 2022

github-actions bot commented Jan 15, 2023

sidewinder12s commented Jan 16, 2023 •

edited

Loading

github-actions bot commented Mar 19, 2023

sidewinder12s commented Mar 20, 2023

pmvrak commented Mar 21, 2023 •

edited

Loading

jdn5126 commented May 12, 2023

github-actions bot commented May 12, 2023

CNI does not wait for ENI and IP to be available before reporting ready when setting MINIMUM_IP_TARGET and WARM_IP_TARGET #1724

CNI does not wait for ENI and IP to be available before reporting ready when setting MINIMUM_IP_TARGET and WARM_IP_TARGET #1724

Comments

dtnyn commented Nov 4, 2021 • edited Loading

What happened:

What you expected to happen:

Environment:

jayanthvn commented Nov 5, 2021 • edited Loading

dtnyn commented Nov 5, 2021 • edited Loading

jayanthvn commented Nov 5, 2021

dtnyn commented Nov 11, 2021 • edited Loading

Examples:

WARM_IP_TARGET=5 MINIMUM_IP_TARGET=25

WARM_ENI_TARGET=1 (Default)

sidewinder12s commented Jan 26, 2022

jayanthvn commented Jan 26, 2022

dtnyn commented Jan 27, 2022

jayanthvn commented Jan 27, 2022

github-actions bot commented Apr 29, 2022

CameronHall commented May 5, 2022

jayanthvn commented May 5, 2022

leofernandezg commented Jul 14, 2022

sidewinder12s commented Sep 12, 2022

jayanthvn commented Sep 12, 2022

github-actions bot commented Nov 12, 2022

sidewinder12s commented Nov 14, 2022

github-actions bot commented Jan 15, 2023

sidewinder12s commented Jan 16, 2023 • edited Loading

github-actions bot commented Mar 19, 2023

sidewinder12s commented Mar 20, 2023

pmvrak commented Mar 21, 2023 • edited Loading

jdn5126 commented May 12, 2023

github-actions bot commented May 12, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

dtnyn commented Nov 4, 2021 •

edited

Loading

jayanthvn commented Nov 5, 2021 •

edited

Loading

dtnyn commented Nov 5, 2021 •

edited

Loading

dtnyn commented Nov 11, 2021 •

edited

Loading

`WARM_IP_TARGET=5` `MINIMUM_IP_TARGET=25`

`WARM_ENI_TARGET=1` (Default)

sidewinder12s commented Jan 16, 2023 •

edited

Loading

pmvrak commented Mar 21, 2023 •

edited

Loading