Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EKS + VPC CNI + Containerd works super slow #1509

Closed
ash-sdt opened this issue Jun 15, 2021 · 6 comments
Closed

EKS + VPC CNI + Containerd works super slow #1509

ash-sdt opened this issue Jun 15, 2021 · 6 comments

Comments

@ash-sdt
Copy link

ash-sdt commented Jun 15, 2021

What happened:
I am running an EKS cluster with self-managed Debian worker nodes and I use containerd cri instead of dockershim. I pass the containerd cri socket to "/var/run/cri.sock" in aws-node pods.
When I spin-up a new node, IPAMD acts super slow and makes the node unusable for a great amount of time. First it takes a lot of time to pass "Testing communication with server" stage, it fails the first time and passes in the second time. this causes aws-node pods to get restarded. also does not add ENI to my worker nodes. If I restart the nodes, it may add ENI to them after some time.
I have also tested Ubuntu Focal, same problem. However, it does not happen with official AL2 EKS AMI.
Unfortunately, there is nothing useful in logs.
These are logs from a test cluster:

{"level":"info","ts":"2021-06-15T09:09:26.330Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.8.0  ..."}
{"level":"info","ts":"2021-06-15T09:09:26.332Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"info","ts":"2021-06-15T09:11:23.117Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.8.0  ..."}
{"level":"info","ts":"2021-06-15T09:11:23.119Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"info","ts":"2021-06-15T09:11:23.128Z","caller":"aws-k8s-agent/main.go:42","msg":"Successful communication with the Cluster! Cluster Version is: v1.19+. git version: v1.19.8-eks-96780e. git tree state: clean. commit: 96780e1b30acbf0a52c38b6030d7853e575bcdf3. platform: linux/amd64"}
{"level":"warn","ts":"2021-06-15T09:11:24.036Z","caller":"awssession/session.go:64","msg":"HTTP_TIMEOUT env is not set or set to less than 10 seconds, defaulting to httpTimeout to 10sec"}
{"level":"debug","ts":"2021-06-15T09:11:24.037Z","caller":"ipamd/ipamd.go:298","msg":"Discovered region: eu-central-1"}
{"level":"info","ts":"2021-06-15T09:11:24.037Z","caller":"ipamd/ipamd.go:298","msg":"Custom networking false"}
{"level":"debug","ts":"2021-06-15T09:11:24.038Z","caller":"awsutils/awsutils.go:357","msg":"Found availability zone: eu-central-1a "}
{"level":"debug","ts":"2021-06-15T09:11:24.038Z","caller":"awsutils/awsutils.go:357","msg":"Discovered the instance primary ip address: 10.212.11.2"}
{"level":"debug","ts":"2021-06-15T09:11:24.039Z","caller":"awsutils/awsutils.go:357","msg":"Found instance-id: i-00418bffc31948791 "}
{"level":"debug","ts":"2021-06-15T09:11:24.039Z","caller":"awsutils/awsutils.go:357","msg":"Found instance-type: t3.medium "}
{"level":"debug","ts":"2021-06-15T09:11:24.040Z","caller":"awsutils/awsutils.go:357","msg":"Found primary interface's MAC address: 02:f3:8a:f8:b3:90"}
{"level":"debug","ts":"2021-06-15T09:11:24.040Z","caller":"awsutils/awsutils.go:357","msg":"eni-059af5ba5bff51d8d is the primary ENI of this instance"}
{"level":"debug","ts":"2021-06-15T09:11:24.042Z","caller":"awsutils/awsutils.go:357","msg":"Found subnet-id: subnet-0b12b24702db8b0fe "}
{"level":"debug","ts":"2021-06-15T09:11:24.042Z","caller":"ipamd/ipamd.go:306","msg":"Using WARM_ENI_TARGET 1"}
{"level":"debug","ts":"2021-06-15T09:11:24.042Z","caller":"ipamd/ipamd.go:316","msg":"Start node init"}
{"level":"info","ts":"2021-06-15T09:11:24.042Z","caller":"awsutils/awsutils.go:1337","msg":"Will attempt to clean up AWS CNI leaked ENIs after waiting 4m41s."}
{"level":"info","ts":"2021-06-15T09:11:24.043Z","caller":"ipamd/ipamd.go:358","msg":"Setting up host network... "}
{"level":"debug","ts":"2021-06-15T09:11:24.043Z","caller":"networkutils/network.go:234","msg":"Trying to find primary interface that has mac : 02:f3:8a:f8:b3:90"}
{"level":"debug","ts":"2021-06-15T09:11:24.043Z","caller":"networkutils/network.go:234","msg":"Discovered interface: lo, mac: "}
{"level":"debug","ts":"2021-06-15T09:11:24.043Z","caller":"networkutils/network.go:234","msg":"Discovered interface: ens5, mac: 02:f3:8a:f8:b3:90"}
{"level":"info","ts":"2021-06-15T09:11:24.043Z","caller":"networkutils/network.go:234","msg":"Discovered primary interface: ens5"}
{"level":"info","ts":"2021-06-15T09:11:24.043Z","caller":"ipamd/ipamd.go:358","msg":"Skip updating RPF for primary interface: net/ipv4/conf/ens5/rp_filter"}
{"level":"debug","ts":"2021-06-15T09:11:24.044Z","caller":"networkutils/network.go:261","msg":"Found the Link that uses mac address 02:f3:8a:f8:b3:90 and its index is 2 (attempt 1/5)"}
{"level":"debug","ts":"2021-06-15T09:11:24.044Z","caller":"networkutils/network.go:322","msg":"Trying to find primary interface that has mac : 02:f3:8a:f8:b3:90"}
{"level":"debug","ts":"2021-06-15T09:11:24.044Z","caller":"networkutils/network.go:322","msg":"Discovered interface: lo, mac: "}
{"level":"debug","ts":"2021-06-15T09:11:24.044Z","caller":"networkutils/network.go:322","msg":"Discovered interface: ens5, mac: 02:f3:8a:f8:b3:90"}
{"level":"info","ts":"2021-06-15T09:11:24.044Z","caller":"networkutils/network.go:322","msg":"Discovered primary interface: ens5"}
{"level":"debug","ts":"2021-06-15T09:11:24.045Z","caller":"networkutils/network.go:330","msg":"Adding 10.212.0.0/16 CIDR to NAT chain"}
{"level":"debug","ts":"2021-06-15T09:11:24.045Z","caller":"networkutils/network.go:330","msg":"Total CIDRs to program - 1"}
{"level":"debug","ts":"2021-06-15T09:11:24.045Z","caller":"networkutils/network.go:330","msg":"Setup Host Network: iptables -N AWS-SNAT-CHAIN-0 -t nat"}
{"level":"debug","ts":"2021-06-15T09:11:24.050Z","caller":"networkutils/network.go:330","msg":"Setup Host Network: iptables -N AWS-SNAT-CHAIN-1 -t nat"}
{"level":"debug","ts":"2021-06-15T09:11:24.052Z","caller":"networkutils/network.go:330","msg":"Setup Host Network: iptables -A POSTROUTING -m comment --comment \"AWS SNAT CHAIN\" -j AWS-SNAT-CHAIN-0"}
{"level":"debug","ts":"2021-06-15T09:11:24.052Z","caller":"networkutils/network.go:330","msg":"Setup Host Network: iptables -A AWS-SNAT-CHAIN-0 ! -d {10.212.0.0/16 %!s(bool=false)} -t nat -j AWS-SNAT-CHAIN-1"}
{"level":"debug","ts":"2021-06-15T09:11:24.052Z","caller":"networkutils/network.go:640","msg":"Setup Host Network: loading existing iptables nat rules with chain prefix AWS-SNAT-CHAIN"}
{"level":"debug","ts":"2021-06-15T09:11:24.054Z","caller":"networkutils/network.go:640","msg":"host network setup: found potentially stale SNAT rule for chain AWS-SNAT-CHAIN-0: [-N AWS-SNAT-CHAIN-0]"}
{"level":"debug","ts":"2021-06-15T09:11:24.055Z","caller":"networkutils/network.go:640","msg":"host network setup: found potentially stale SNAT rule for chain AWS-SNAT-CHAIN-1: [-N AWS-SNAT-CHAIN-1]"}
{"level":"debug","ts":"2021-06-15T09:11:24.055Z","caller":"networkutils/network.go:435","msg":"Setup Host Network: computing stale iptables rules for %s table with chain prefix %s"}
{"level":"debug","ts":"2021-06-15T09:11:24.055Z","caller":"networkutils/network.go:435","msg":"Setup Host Network: active chain found: AWS-SNAT-CHAIN-0"}
{"level":"debug","ts":"2021-06-15T09:11:24.055Z","caller":"networkutils/network.go:435","msg":"Setup Host Network: active chain found: AWS-SNAT-CHAIN-1"}
{"level":"debug","ts":"2021-06-15T09:11:24.055Z","caller":"networkutils/network.go:330","msg":"iptableRules: [nat/POSTROUTING rule first SNAT rules for non-VPC outbound traffic shouldExist false rule [-m comment --comment AWS SNAT CHAIN -j AWS-SNAT-CHAIN-0] nat/AWS-SNAT-CHAIN-0 rule [0] AWS-SNAT-CHAIN shouldExist false rule [! -d 10.212.0.0/16 -m comment --comment AWS SNAT CHAIN -j AWS-SNAT-CHAIN-1] nat/AWS-SNAT-CHAIN-1 rule last SNAT rule for non-VPC outbound traffic shouldExist false rule [! -o vlan+ -m comment --comment AWS, SNAT -m addrtype ! --dst-type LOCAL -j SNAT --to-source 10.212.11.2 --random-fully] mangle/PREROUTING rule connmark for primary ENI shouldExist true rule [-m comment --comment AWS, primary ENI -i ens5 -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-mark 0x80/0x80] mangle/PREROUTING rule connmark restore for primary ENI shouldExist true rule [-m comment --comment AWS, primary ENI -i eni+ -j CONNMARK --restore-mark --mask 0x80] mangle/PREROUTING rule connmark restore for primary ENI from vlan shouldExist true rule [-m comment --comment AWS, primary ENI -i vlan+ -j CONNMARK --restore-mark --mask 0x80]]"}
{"level":"debug","ts":"2021-06-15T09:11:24.055Z","caller":"networkutils/network.go:334","msg":"execute iptable rule : first SNAT rules for non-VPC outbound traffic"}
{"level":"debug","ts":"2021-06-15T09:11:24.056Z","caller":"networkutils/network.go:334","msg":"rule nat/POSTROUTING rule first SNAT rules for non-VPC outbound traffic shouldExist false rule [-m comment --comment AWS SNAT CHAIN -j AWS-SNAT-CHAIN-0] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.056Z","caller":"networkutils/network.go:334","msg":"execute iptable rule : [0] AWS-SNAT-CHAIN"}
{"level":"debug","ts":"2021-06-15T09:11:24.057Z","caller":"networkutils/network.go:334","msg":"rule nat/AWS-SNAT-CHAIN-0 rule [0] AWS-SNAT-CHAIN shouldExist false rule [! -d 10.212.0.0/16 -m comment --comment AWS SNAT CHAIN -j AWS-SNAT-CHAIN-1] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.057Z","caller":"networkutils/network.go:334","msg":"execute iptable rule : last SNAT rule for non-VPC outbound traffic"}
{"level":"debug","ts":"2021-06-15T09:11:24.058Z","caller":"networkutils/network.go:334","msg":"rule nat/AWS-SNAT-CHAIN-1 rule last SNAT rule for non-VPC outbound traffic shouldExist false rule [! -o vlan+ -m comment --comment AWS, SNAT -m addrtype ! --dst-type LOCAL -j SNAT --to-source 10.212.11.2 --random-fully] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.058Z","caller":"networkutils/network.go:334","msg":"execute iptable rule : connmark for primary ENI"}
{"level":"debug","ts":"2021-06-15T09:11:24.065Z","caller":"networkutils/network.go:334","msg":"rule mangle/PREROUTING rule connmark for primary ENI shouldExist true rule [-m comment --comment AWS, primary ENI -i ens5 -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-mark 0x80/0x80] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.066Z","caller":"networkutils/network.go:334","msg":"execute iptable rule : connmark restore for primary ENI"}
{"level":"debug","ts":"2021-06-15T09:11:24.067Z","caller":"networkutils/network.go:334","msg":"rule mangle/PREROUTING rule connmark restore for primary ENI shouldExist true rule [-m comment --comment AWS, primary ENI -i eni+ -j CONNMARK --restore-mark --mask 0x80] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.068Z","caller":"networkutils/network.go:334","msg":"execute iptable rule : connmark restore for primary ENI from vlan"}
{"level":"debug","ts":"2021-06-15T09:11:24.069Z","caller":"networkutils/network.go:334","msg":"rule mangle/PREROUTING rule connmark restore for primary ENI from vlan shouldExist true rule [-m comment --comment AWS, primary ENI -i vlan+ -j CONNMARK --restore-mark --mask 0x80] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.070Z","caller":"networkutils/network.go:338","msg":"Total CIDRs to exempt from connmark rules - 1"}
{"level":"debug","ts":"2021-06-15T09:11:24.070Z","caller":"networkutils/network.go:338","msg":"Setup Host Network: iptables -N AWS-CONNMARK-CHAIN-0 -t nat"}
{"level":"debug","ts":"2021-06-15T09:11:24.071Z","caller":"networkutils/network.go:338","msg":"Setup Host Network: iptables -N AWS-CONNMARK-CHAIN-1 -t nat"}
{"level":"debug","ts":"2021-06-15T09:11:24.072Z","caller":"networkutils/network.go:338","msg":"Setup Host Network: iptables -t nat -A PREROUTING -i eni+ -m comment --comment \"AWS, outbound connections\" -m state --state NEW -j AWS-CONNMARK-CHAIN-0"}
{"level":"debug","ts":"2021-06-15T09:11:24.072Z","caller":"networkutils/network.go:338","msg":"Setup Host Network: iptables -A AWS-CONNMARK-CHAIN-0 ! -d 10.212.0.0/16 -t nat -j AWS-CONNMARK-CHAIN-1"}
{"level":"debug","ts":"2021-06-15T09:11:24.072Z","caller":"networkutils/network.go:640","msg":"Setup Host Network: loading existing iptables nat rules with chain prefix AWS-CONNMARK-CHAIN"}
{"level":"debug","ts":"2021-06-15T09:11:24.074Z","caller":"networkutils/network.go:640","msg":"host network setup: found potentially stale SNAT rule for chain AWS-CONNMARK-CHAIN-0: [-N AWS-CONNMARK-CHAIN-0]"}
{"level":"debug","ts":"2021-06-15T09:11:24.075Z","caller":"networkutils/network.go:640","msg":"host network setup: found potentially stale SNAT rule for chain AWS-CONNMARK-CHAIN-1: [-N AWS-CONNMARK-CHAIN-1]"}
{"level":"debug","ts":"2021-06-15T09:11:24.075Z","caller":"networkutils/network.go:565","msg":"Setup Host Network: computing stale iptables rules for %s table with chain prefix %s"}
{"level":"debug","ts":"2021-06-15T09:11:24.075Z","caller":"networkutils/network.go:565","msg":"Setup Host Network: active chain found: AWS-CONNMARK-CHAIN-0"}
{"level":"debug","ts":"2021-06-15T09:11:24.075Z","caller":"networkutils/network.go:565","msg":"Setup Host Network: active chain found: AWS-CONNMARK-CHAIN-1"}
{"level":"debug","ts":"2021-06-15T09:11:24.075Z","caller":"networkutils/network.go:338","msg":"iptableRules: [nat/PREROUTING rule connmark rule for non-VPC outbound traffic shouldExist false rule [-i eni+ -m comment --comment AWS, outbound connections -m state --state NEW -j AWS-CONNMARK-CHAIN-0] nat/AWS-CONNMARK-CHAIN-0 rule [0] AWS-SNAT-CHAIN shouldExist false rule [! -d 10.212.0.0/16 -m comment --comment AWS CONNMARK CHAIN, VPC CIDR -j AWS-CONNMARK-CHAIN-1] nat/AWS-CONNMARK-CHAIN-1 rule connmark rule for external  outbound traffic shouldExist false rule [-m comment --comment AWS, CONNMARK -j CONNMARK --set-xmark 0x80/0x80] nat/PREROUTING rule connmark to fwmark copy shouldExist false rule [-m comment --comment AWS, CONNMARK -j CONNMARK --restore-mark --mask 0x80] nat/PREROUTING rule connmark to fwmark copy shouldExist false rule [-m comment --comment AWS, CONNMARK -j CONNMARK --restore-mark --mask 0x80]]"}
{"level":"debug","ts":"2021-06-15T09:11:24.075Z","caller":"networkutils/network.go:342","msg":"execute iptable rule : connmark rule for non-VPC outbound traffic"}
{"level":"debug","ts":"2021-06-15T09:11:24.079Z","caller":"networkutils/network.go:342","msg":"rule nat/PREROUTING rule connmark rule for non-VPC outbound traffic shouldExist false rule [-i eni+ -m comment --comment AWS, outbound connections -m state --state NEW -j AWS-CONNMARK-CHAIN-0] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.079Z","caller":"networkutils/network.go:342","msg":"execute iptable rule : [0] AWS-SNAT-CHAIN"}
{"level":"debug","ts":"2021-06-15T09:11:24.080Z","caller":"networkutils/network.go:342","msg":"rule nat/AWS-CONNMARK-CHAIN-0 rule [0] AWS-SNAT-CHAIN shouldExist false rule [! -d 10.212.0.0/16 -m comment --comment AWS CONNMARK CHAIN, VPC CIDR -j AWS-CONNMARK-CHAIN-1] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.080Z","caller":"networkutils/network.go:342","msg":"execute iptable rule : connmark rule for external  outbound traffic"}
{"level":"debug","ts":"2021-06-15T09:11:24.081Z","caller":"networkutils/network.go:342","msg":"rule nat/AWS-CONNMARK-CHAIN-1 rule connmark rule for external  outbound traffic shouldExist false rule [-m comment --comment AWS, CONNMARK -j CONNMARK --set-xmark 0x80/0x80] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.081Z","caller":"networkutils/network.go:342","msg":"execute iptable rule : connmark to fwmark copy"}
{"level":"debug","ts":"2021-06-15T09:11:24.082Z","caller":"networkutils/network.go:342","msg":"rule nat/PREROUTING rule connmark to fwmark copy shouldExist false rule [-m comment --comment AWS, CONNMARK -j CONNMARK --restore-mark --mask 0x80] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.082Z","caller":"networkutils/network.go:342","msg":"execute iptable rule : connmark to fwmark copy"}
{"level":"debug","ts":"2021-06-15T09:11:24.083Z","caller":"networkutils/network.go:342","msg":"rule nat/PREROUTING rule connmark to fwmark copy shouldExist false rule [-m comment --comment AWS, CONNMARK -j CONNMARK --restore-mark --mask 0x80] exists false, err <nil>"}
{"level":"debug","ts":"2021-06-15T09:11:24.084Z","caller":"awsutils/awsutils.go:955","msg":"Total number of interfaces found: 1 "}
{"level":"debug","ts":"2021-06-15T09:11:24.084Z","caller":"awsutils/awsutils.go:516","msg":"Found ENI MAC address: 02:f3:8a:f8:b3:90"}
{"level":"debug","ts":"2021-06-15T09:11:24.086Z","caller":"awsutils/awsutils.go:516","msg":"Found ENI: eni-059af5ba5bff51d8d, MAC 02:f3:8a:f8:b3:90, device 0"}
{"level":"info","ts":"2021-06-15T09:11:24.226Z","caller":"ipamd/ipamd.go:363","msg":"Got network cardindex 0 for ENI eni-059af5ba5bff51d8d"}
{"level":"info","ts":"2021-06-15T09:11:24.226Z","caller":"ipamd/ipamd.go:363","msg":"eni-059af5ba5bff51d8d is of type: interface"}
{"level":"debug","ts":"2021-06-15T09:11:24.226Z","caller":"ipamd/ipamd.go:316","msg":"DescribeAllENIs success: ENIs: 1, tagged: 1"}
{"level":"debug","ts":"2021-06-15T09:11:24.226Z","caller":"ipamd/ipamd.go:316","msg":"Discovered ENI eni-059af5ba5bff51d8d, trying to set it up"}
{"level":"debug","ts":"2021-06-15T09:11:24.226Z","caller":"ipamd/ipamd.go:382","msg":"Tagging ENI eni-059af5ba5bff51d8d with missing tags: map[node.k8s.amazonaws.com/instance_id:i-00418bffc31948791]"}
{"level":"debug","ts":"2021-06-15T09:11:24.320Z","caller":"retry/retry.go:70","msg":"Successfully tagged ENI: eni-059af5ba5bff51d8d"}
{"level":"debug","ts":"2021-06-15T09:11:24.320Z","caller":"ipamd/ipamd.go:799","msg":"DataStore Add an ENI eni-059af5ba5bff51d8d"}
{"level":"debug","ts":"2021-06-15T09:11:24.320Z","caller":"ipamd/ipamd.go:820","msg":"IP Address Pool stats: total: 0, assigned: 0"}
{"level":"info","ts":"2021-06-15T09:11:24.320Z","caller":"ipamd/ipamd.go:316","msg":"ENI eni-059af5ba5bff51d8d set up."}
{"level":"info","ts":"2021-06-15T09:11:24.320Z","caller":"ipamd/ipamd.go:413","msg":"Reading ipam state from CRI"}
{"level":"debug","ts":"2021-06-15T09:11:24.320Z","caller":"datastore/data_store.go:296","msg":"Getting running pod sandboxes from \"unix:///var/run/cri.sock\""}
{"level":"debug","ts":"2021-06-15T09:11:24.322Z","caller":"datastore/data_store.go:296","msg":"Ignoring sandbox 2249745b0b0b069a37ef285eebfa5aedddd62ac4c809444613f679918e8fec98 with non-pod netns mode NODE"}
{"level":"debug","ts":"2021-06-15T09:11:24.322Z","caller":"datastore/data_store.go:296","msg":"Ignoring sandbox 0629246cc1f7a0f99ab7a80256b28c3d5570807368933710537d4bf367fcb349 with non-pod netns mode NODE"}
{"level":"info","ts":"2021-06-15T09:11:24.322Z","caller":"ipamd/ipamd.go:413","msg":"datastore: Sandbox _migrated-from-cri/7a39b20800674edcf06bf71c9bc7bdf88fdd86feb29383cf7f6f0ce6d7e708c7/unknown uses unknown IPv4 10.88.0.2 - presuming stale/dead"}
{"level":"info","ts":"2021-06-15T09:11:24.322Z","caller":"ipamd/ipamd.go:413","msg":"datastore: Sandbox _migrated-from-cri/7a39b20800674edcf06bf71c9bc7bdf88fdd86feb29383cf7f6f0ce6d7e708c7/unknown uses unknown IPv4 2001:4860:4860::2 - presuming stale/dead"}
{"level":"debug","ts":"2021-06-15T09:11:24.323Z","caller":"ipamd/ipamd.go:413","msg":"Completed ipam state recovery"}
{"level":"info","ts":"2021-06-15T09:11:24.323Z","caller":"ipamd/ipamd.go:425","msg":"Get Node Info for: ip-10-212-11-2.eu-central-1.compute.internal"}
{"level":"debug","ts":"2021-06-15T09:11:24.423Z","caller":"ipamd/ipamd.go:436","msg":"Node found \"ip-10-212-11-2.eu-central-1.compute.internal\" - labels - '\\f'"}
{"level":"debug","ts":"2021-06-15T09:11:24.423Z","caller":"ipamd/ipamd.go:436","msg":"Deleting label \"vpc.amazonaws.com/eniConfig\""}
{"level":"debug","ts":"2021-06-15T09:11:24.442Z","caller":"ipamd/ipamd.go:436","msg":"Updated node ip-10-212-11-2.eu-central-1.compute.internal with label \"vpc.amazonaws.com/eniConfig\": \"\""}
{"level":"debug","ts":"2021-06-15T09:11:24.442Z","caller":"ipamd/ipamd.go:766","msg":"Found ENI eni-059af5ba5bff51d8d that has less than the maximum number of IP addresses allocated: cur=0, max=5"}
{"level":"info","ts":"2021-06-15T09:11:24.443Z","caller":"ipamd/ipamd.go:770","msg":"Trying to allocate 5 IP addresses on ENI eni-059af5ba5bff51d8d"}
{"level":"info","ts":"2021-06-15T09:11:24.818Z","caller":"ipamd/ipamd.go:770","msg":"Allocated 5 private IP addresses"}
{"level":"info","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:830","msg":"Added ENI(eni-059af5ba5bff51d8d)'s IP 10.212.24.113 to datastore"}
{"level":"info","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:830","msg":"Added ENI(eni-059af5ba5bff51d8d)'s IP 10.212.43.66 to datastore"}
{"level":"info","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:830","msg":"Added ENI(eni-059af5ba5bff51d8d)'s IP 10.212.51.34 to datastore"}
{"level":"info","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:830","msg":"Added ENI(eni-059af5ba5bff51d8d)'s IP 10.212.29.94 to datastore"}
{"level":"info","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:830","msg":"Added ENI(eni-059af5ba5bff51d8d)'s IP 10.212.40.56 to datastore"}
{"level":"debug","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:786","msg":"IP Address Pool stats: total: 5, assigned: 0"}
{"level":"debug","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:462","msg":"Successfully increased IP pool, total: 5, used: 0"}
{"level":"debug","ts":"2021-06-15T09:11:24.903Z","caller":"ipamd/ipamd.go:690","msg":"IP pool stats: total = 5, used = 0, c.maxIPsPerENI = 5"}
{"level":"info","ts":"2021-06-15T09:11:24.904Z","caller":"ipamd/ipamd.go:323","msg":"Found sg-021036fb22c8395d7, added to ipamd cache"}
{"level":"info","ts":"2021-06-15T09:11:24.904Z","caller":"ipamd/ipamd.go:323","msg":"Found sg-080f2deb51f61018c, added to ipamd cache"}
{"level":"debug","ts":"2021-06-15T09:11:24.904Z","caller":"awsutils/awsutils.go:457","msg":"Total number of interfaces found: 1 "}
{"level":"debug","ts":"2021-06-15T09:11:24.904Z","caller":"awsutils/awsutils.go:516","msg":"Found ENI MAC address: 02:f3:8a:f8:b3:90"}
{"level":"debug","ts":"2021-06-15T09:11:24.906Z","caller":"awsutils/awsutils.go:516","msg":"Found ENI: eni-059af5ba5bff51d8d, MAC 02:f3:8a:f8:b3:90, device 0"}
{"level":"debug","ts":"2021-06-15T09:11:24.907Z","caller":"ipamd/ipamd.go:323","msg":"Update ENI eni-059af5ba5bff51d8d"}
{"level":"info","ts":"2021-06-15T09:11:25.316Z","caller":"aws-k8s-agent/main.go:73","msg":"Serving RPC Handler version v1.8.0 on 127.0.0.1:50051"}
{"level":"info","ts":"2021-06-15T09:11:25.316Z","caller":"runtime/asm_amd64.s:1373","msg":"Serving metrics on port 61678"}
{"level":"info","ts":"2021-06-15T09:11:25.317Z","caller":"runtime/asm_amd64.s:1373","msg":"Setting up shutdown hook."}
{"level":"info","ts":"2021-06-15T09:11:25.317Z","caller":"ipamd/introspect.go:62","msg":"Serving introspection endpoints on 127.0.0.1:61679"}
{"level":"debug","ts":"2021-06-15T09:12:25.319Z","caller":"ipamd/ipamd.go:525","msg":"Reconciling ENI/IP pool info because time since last 1m0.416512742s <= 1m0s"}
{"level":"debug","ts":"2021-06-15T09:12:25.320Z","caller":"ipamd/ipamd.go:969","msg":"Total number of interfaces found: 1 "}
{"level":"debug","ts":"2021-06-15T09:12:25.320Z","caller":"awsutils/awsutils.go:516","msg":"Found ENI MAC address: 02:f3:8a:f8:b3:90"}
{"level":"debug","ts":"2021-06-15T09:12:25.324Z","caller":"awsutils/awsutils.go:516","msg":"Found ENI: eni-059af5ba5bff51d8d, MAC 02:f3:8a:f8:b3:90, device 0"}
{"level":"debug","ts":"2021-06-15T09:12:25.325Z","caller":"ipamd/ipamd.go:525","msg":"Reconcile existing ENI eni-059af5ba5bff51d8d IP pool"}
{"level":"debug","ts":"2021-06-15T09:12:25.325Z","caller":"ipamd/ipamd.go:1098","msg":"Reconcile and skip primary IP 10.212.11.2 on ENI eni-059af5ba5bff51d8d"}
{"level":"debug","ts":"2021-06-15T09:12:25.326Z","caller":"ipamd/ipamd.go:525","msg":"Successfully Reconciled ENI/IP pool"}
{"level":"debug","ts":"2021-06-15T09:12:25.326Z","caller":"ipamd/ipamd.go:525","msg":"IP Address Pool stats: total: 5, assigned: 0"}
{"level":"debug","ts":"2021-06-15T09:13:25.329Z","caller":"ipamd/ipamd.go:525","msg":"Reconciling ENI/IP pool info because time since last 1m0.00925058s <= 1m0s"}
{"level":"debug","ts":"2021-06-15T09:13:25.329Z","caller":"ipamd/ipamd.go:969","msg":"Total number of interfaces found: 1 "}

Environment:

  • Kubernetes version: EKS 1.19
  • CNI Version: 1.8.0
  • OS: Debian Buster / Ubuntu Focal
  • Kernel: 5.9
@ash-sdt ash-sdt changed the title EKS + L-IPAMD + Containerd works super slow EKS + VPC CNI + Containerd works super slow Jun 15, 2021
@jayanthvn
Copy link
Contributor

Hi @ash-sdt

Regarding the restart -

{"level":"info","ts":"2021-06-15T09:09:26.330Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.8.0  ..."}
{"level":"info","ts":"2021-06-15T09:09:26.332Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}
{"level":"info","ts":"2021-06-15T09:11:23.117Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.8.0  ..."}
{"level":"info","ts":"2021-06-15T09:11:23.119Z","caller":"aws-k8s-agent/main.go:42","msg":"Testing communication with server"}

Can you please check if kube-proxy is failing to get node info? Ref - #1078 (comment)

Based on the logs I see WARM_ENI_TARGET is set to 1 and the assigned IPs is 0 hence you will see just one ENI - eni-059af5ba5bff51d8d with 5 IPs since you are using t3.medium. Even if you add one non-host networking pod then a new ENI will be allocated. Are you saying new ENis are not getting attached even after adding the pods?

@ash-sdt
Copy link
Author

ash-sdt commented Jun 15, 2021

@jayanthvn It seems it can't

I0615 09:07:24.348293       1 flags.go:59] FLAG: --add-dir-header="false"
I0615 09:07:24.348352       1 flags.go:59] FLAG: --alsologtostderr="false"
I0615 09:07:24.348357       1 flags.go:59] FLAG: --bind-address="0.0.0.0"
I0615 09:07:24.348366       1 flags.go:59] FLAG: --bind-address-hard-fail="false"
I0615 09:07:24.348371       1 flags.go:59] FLAG: --cleanup="false"
I0615 09:07:24.348375       1 flags.go:59] FLAG: --cleanup-ipvs="true"
I0615 09:07:24.348379       1 flags.go:59] FLAG: --cluster-cidr=""
I0615 09:07:24.348385       1 flags.go:59] FLAG: --config="/var/lib/kube-proxy-config/config"
I0615 09:07:24.348390       1 flags.go:59] FLAG: --config-sync-period="15m0s"
I0615 09:07:24.348396       1 flags.go:59] FLAG: --conntrack-max-per-core="32768"
I0615 09:07:24.348403       1 flags.go:59] FLAG: --conntrack-min="131072"
I0615 09:07:24.348408       1 flags.go:59] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0615 09:07:24.348413       1 flags.go:59] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0615 09:07:24.348417       1 flags.go:59] FLAG: --detect-local-mode=""
I0615 09:07:24.348423       1 flags.go:59] FLAG: --feature-gates=""
I0615 09:07:24.348433       1 flags.go:59] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0615 09:07:24.348439       1 flags.go:59] FLAG: --healthz-port="10256"
I0615 09:07:24.348444       1 flags.go:59] FLAG: --help="false"
I0615 09:07:24.348449       1 flags.go:59] FLAG: --hostname-override=""
I0615 09:07:24.348454       1 flags.go:59] FLAG: --iptables-masquerade-bit="14"
I0615 09:07:24.348458       1 flags.go:59] FLAG: --iptables-min-sync-period="1s"
I0615 09:07:24.348464       1 flags.go:59] FLAG: --iptables-sync-period="30s"
I0615 09:07:24.348472       1 flags.go:59] FLAG: --ipvs-exclude-cidrs="[]"
I0615 09:07:24.348493       1 flags.go:59] FLAG: --ipvs-min-sync-period="0s"
I0615 09:07:24.348498       1 flags.go:59] FLAG: --ipvs-scheduler=""
I0615 09:07:24.348502       1 flags.go:59] FLAG: --ipvs-strict-arp="false"
I0615 09:07:24.348506       1 flags.go:59] FLAG: --ipvs-sync-period="30s"
I0615 09:07:24.348510       1 flags.go:59] FLAG: --ipvs-tcp-timeout="0s"
I0615 09:07:24.348515       1 flags.go:59] FLAG: --ipvs-tcpfin-timeout="0s"
I0615 09:07:24.348520       1 flags.go:59] FLAG: --ipvs-udp-timeout="0s"
I0615 09:07:24.348524       1 flags.go:59] FLAG: --kube-api-burst="10"
I0615 09:07:24.348529       1 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0615 09:07:24.348535       1 flags.go:59] FLAG: --kube-api-qps="5"
I0615 09:07:24.348543       1 flags.go:59] FLAG: --kubeconfig=""
I0615 09:07:24.348611       1 flags.go:59] FLAG: --log-backtrace-at=":0"
I0615 09:07:24.348625       1 flags.go:59] FLAG: --log-dir=""
I0615 09:07:24.348630       1 flags.go:59] FLAG: --log-file=""
I0615 09:07:24.348634       1 flags.go:59] FLAG: --log-file-max-size="1800"
I0615 09:07:24.348640       1 flags.go:59] FLAG: --log-flush-frequency="5s"
I0615 09:07:24.348648       1 flags.go:59] FLAG: --logtostderr="true"
I0615 09:07:24.348652       1 flags.go:59] FLAG: --masquerade-all="false"
I0615 09:07:24.348657       1 flags.go:59] FLAG: --master=""
I0615 09:07:24.348663       1 flags.go:59] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0615 09:07:24.348677       1 flags.go:59] FLAG: --metrics-port="10249"
I0615 09:07:24.348683       1 flags.go:59] FLAG: --nodeport-addresses="[]"
I0615 09:07:24.348702       1 flags.go:59] FLAG: --oom-score-adj="-999"
I0615 09:07:24.348706       1 flags.go:59] FLAG: --profiling="false"
I0615 09:07:24.348710       1 flags.go:59] FLAG: --proxy-mode=""
I0615 09:07:24.348716       1 flags.go:59] FLAG: --proxy-port-range=""
I0615 09:07:24.348721       1 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0615 09:07:24.348726       1 flags.go:59] FLAG: --skip-headers="false"
I0615 09:07:24.348831       1 flags.go:59] FLAG: --skip-log-headers="false"
I0615 09:07:24.348871       1 flags.go:59] FLAG: --stderrthreshold="2"
I0615 09:07:24.348880       1 flags.go:59] FLAG: --udp-timeout="250ms"
I0615 09:07:24.348888       1 flags.go:59] FLAG: --v="2"
I0615 09:07:24.348893       1 flags.go:59] FLAG: --version="false"
I0615 09:07:24.348901       1 flags.go:59] FLAG: --vmodule=""
I0615 09:07:24.348906       1 flags.go:59] FLAG: --write-config-to=""
I0615 09:07:24.350543       1 feature_gate.go:243] feature gates: &{map[]}
I0615 09:07:24.351293       1 feature_gate.go:243] feature gates: &{map[]}
E0615 09:07:24.609886       1 node.go:125] Failed to retrieve node info: nodes "ip-10-212-116-238" not found
E0615 09:07:25.806664       1 node.go:125] Failed to retrieve node info: nodes "ip-10-212-116-238" not found
E0615 09:07:27.858262       1 node.go:125] Failed to retrieve node info: nodes "ip-10-212-116-238" not found
E0615 09:07:32.285852       1 node.go:125] Failed to retrieve node info: nodes "ip-10-212-116-238" not found
E0615 09:07:40.820995       1 node.go:125] Failed to retrieve node info: nodes "ip-10-212-116-238" not found
E0615 09:07:57.114918       1 node.go:125] Failed to retrieve node info: nodes "ip-10-212-116-238" not found
I0615 09:07:57.114941       1 server_others.go:431] can't determine this node's IP, assuming 127.0.0.1; if this is incorrect, please set the --bind-address flag
I0615 09:07:57.114963       1 server_others.go:142] kube-proxy node IP is an IPv4 address (127.0.0.1), assume IPv4 operation
I0615 09:07:57.149370       1 server_others.go:182] DetectLocalMode: 'ClusterCIDR'
I0615 09:07:57.149398       1 server_others.go:185] Using iptables Proxier.
W0615 09:07:57.149406       1 server_others.go:455] detect-local-mode set to ClusterCIDR, but no cluster CIDR defined
I0615 09:07:57.149410       1 server_others.go:466] detect-local-mode: ClusterCIDR , defaulting to no-op detect-local
I0615 09:07:57.149633       1 utils.go:367] Changed sysctl "net/ipv4/conf/all/route_localnet": 0 -> 1
I0615 09:07:57.149837       1 proxier.go:286] iptables(IPv4) masquerade mark: 0x00004000
I0615 09:07:57.149955       1 proxier.go:328] iptables(IPv4) sync params: minSyncPeriod=1s, syncPeriod=30s, burstSyncs=2
I0615 09:07:57.150029       1 proxier.go:340] iptables(IPv4) supports --random-fully
I0615 09:07:57.150286       1 server.go:650] Version: v1.19.6-eks-986540
I0615 09:07:57.151894       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0615 09:07:57.152028       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0615 09:07:57.152234       1 mount_linux.go:163] Detected OS without systemd
I0615 09:07:57.153079       1 conntrack.go:83] Setting conntrack hashsize to 32768
I0615 09:07:57.172055       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I0615 09:07:57.172122       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I0615 09:07:57.172310       1 config.go:315] Starting service config controller
I0615 09:07:57.173461       1 shared_informer.go:240] Waiting for caches to sync for service config
I0615 09:07:57.172518       1 reflector.go:207] Starting reflector *v1.Service (15m0s) from k8s.io/client-go/informers/factory.go:134
I0615 09:07:57.172544       1 config.go:224] Starting endpoint slice config controller
I0615 09:07:57.174215       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0615 09:07:57.172664       1 reflector.go:207] Starting reflector *v1beta1.EndpointSlice (15m0s) from k8s.io/client-go/informers/factory.go:134
I0615 09:07:57.177335       1 service.go:277] Service default/kubernetes updated: 1 ports
I0615 09:07:57.177391       1 service.go:277] Service kube-system/kube-dns updated: 2 ports
I0615 09:07:57.273782       1 shared_informer.go:247] Caches are synced for service config
I0615 09:07:57.274003       1 proxier.go:801] Not syncing iptables until Services and Endpoints have been received from master
I0615 09:07:57.274883       1 shared_informer.go:247] Caches are synced for endpoint slice config
I0615 09:07:57.275213       1 service.go:396] Adding new service port "kube-system/kube-dns:dns-tcp" at 172.20.0.10:53/TCP
I0615 09:07:57.275237       1 service.go:396] Adding new service port "kube-system/kube-dns:dns" at 172.20.0.10:53/UDP
I0615 09:07:57.275265       1 service.go:396] Adding new service port "default/kubernetes:https" at 172.20.0.1:443/TCP
I0615 09:07:57.275400       1 proxier.go:837] Stale udp service kube-system/kube-dns:dns -> 172.20.0.10
I0615 09:07:57.275442       1 proxier.go:845] Syncing iptables rules
I0615 09:07:57.454457       1 proxier.go:809] syncProxyRules took 179.532759ms
I0615 10:07:57.172774       1 proxier.go:845] Syncing iptables rules
I0615 10:07:57.230604       1 proxier.go:809] syncProxyRules took 58.135924ms
I0615 11:07:57.231019       1 proxier.go:845] Syncing iptables rules
I0615 11:07:57.270532       1 proxier.go:809] syncProxyRules took 39.741157ms
I0615 12:07:57.270932       1 proxier.go:845] Syncing iptables rules
I0615 12:07:57.338256       1 proxier.go:809] syncProxyRules took 67.53935ms
I0615 13:07:57.338864       1 proxier.go:845] Syncing iptables rules
I0615 13:07:57.398546       1 proxier.go:809] syncProxyRules took 59.865655ms
I0615 14:07:57.398998       1 proxier.go:845] Syncing iptables rules
I0615 14:07:57.486473       1 proxier.go:809] syncProxyRules took 87.518599ms
I0615 15:07:57.486966       1 proxier.go:845] Syncing iptables rules
I0615 15:07:57.666514       1 proxier.go:809] syncProxyRules took 179.825119ms
I0615 16:07:57.666946       1 proxier.go:845] Syncing iptables rules
I0615 16:07:57.722312       1 proxier.go:809] syncProxyRules took 55.576301ms
I0615 17:07:57.722723       1 proxier.go:845] Syncing iptables rules
I0615 17:07:57.782418       1 proxier.go:809] syncProxyRules took 59.920262ms
I0615 18:07:57.782887       1 proxier.go:845] Syncing iptables rules
I0615 18:07:57.830454       1 proxier.go:809] syncProxyRules took 47.796584ms

@jayanthvn
Copy link
Contributor

You can try this workaround - kubernetes/kubernetes#61486 (comment) and this should fix kube-proxy error. We are also working internally to fix this up in kubeproxy.

regarding the 2nd issue, can you please confirm if you are able to attach ENIs when you scale the pods?

@ash-sdt
Copy link
Author

ash-sdt commented Jun 16, 2021

Thank you @jayanthvn
I can confirm the issue is fixed

@ash-sdt ash-sdt closed this as completed Jun 16, 2021
@ash-sdt
Copy link
Author

ash-sdt commented Jun 16, 2021

Well it seems there is another problem now.
The nodes which are not running coredns can not resolve dns.

[2021/06/16 12:57:39] [ info] [filter:kubernetes:kubernetes.0] Wait 30 secs until DNS starts up (1/6)
[2021/06/16 12:58:15] [ info] [filter:kubernetes:kubernetes.0] Wait 30 secs until DNS starts up (2/6)
[2021/06/16 12:58:51] [ info] [filter:kubernetes:kubernetes.0] Wait 30 secs until DNS starts up (3/6)
[2021/06/16 12:59:28] [ info] [filter:kubernetes:kubernetes.0] Wait 30 secs until DNS starts up (4/6)
[2021/06/16 13:00:04] [ info] [filter:kubernetes:kubernetes.0] Wait 30 secs until DNS starts up (5/6)
[2021/06/16 13:00:40] [ info] [filter:kubernetes:kubernetes.0] Wait 30 secs until DNS starts up (6/6)
[2021/06/16 13:01:10] [ warn] [filter:kubernetes:kubernetes.0] could not resolve kubernetes.default.svc

@ash-sdt ash-sdt reopened this Jun 16, 2021
@ash-sdt
Copy link
Author

ash-sdt commented Jun 17, 2021

Well it seems the DNS issue is not related to VPC-CNI.
Thank you for helping.

@ash-sdt ash-sdt closed this as completed Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants