You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
An update to the pod log-level was release, DEBUG to INFO for both:
AWS_VPC_K8S_CNI_LOGLEVEL
AWS_VPC_K8S_PLUGIN_LOG_LEVEL
On apply, the pod was stuck in CrashLoopBackOff. The change was reverted, restoring DEBUG level, but the pod remained stuck in CrashLoopBackOff. The section asked me to describe what happened; its certainly not related to the change to the log level. I suspect the selected eni on start is incorrect?
{"level":"info","ts":"2021-04-08T20:15:17.105Z","caller":"entrypoint.sh","msg":"Install CNI binary.."}
{"level":"info","ts":"2021-04-08T20:15:17.119Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-04-08T20:15:17.120Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-04-08T20:15:17.142Z","caller":"aws-k8s-agent/main.go:28","msg":"Starting L-IPAMD v1.7.5 ..."}
{"level":"info","ts":"2021-04-08T20:15:17.186Z","caller":"aws-k8s-agent/main.go:41","msg":"Testing communication with server"}
{"level":"info","ts":"2021-04-08T20:15:17.187Z","caller":"aws-k8s-agent/main.go:41","msg":"Successful communication with the Cluster! Cluster Version is: v1.16+. git version: v1.16.15-eks-ad4801. git tree state: clean. commit: ad4801fd44fe0f125c8d13f1b1d4827e8884476d. platform: linux/amd64"}
{"level":"debug","ts":"2021-04-08T20:15:17.189Z","caller":"ipamd/ipamd.go:292","msg":"Discovered region: ca-central-1"}
{"level":"info","ts":"2021-04-08T20:15:17.189Z","caller":"ipamd/ipamd.go:292","msg":"Custom networking false"}
{"level":"debug","ts":"2021-04-08T20:15:17.190Z","caller":"awsutils/awsutils.go:313","msg":"Found availability zone: ca-central-1a "}
{"level":"debug","ts":"2021-04-08T20:15:17.190Z","caller":"awsutils/awsutils.go:313","msg":"Discovered the instance primary ip address: 10.3.99.93"}
{"level":"debug","ts":"2021-04-08T20:15:17.191Z","caller":"awsutils/awsutils.go:313","msg":"Found instance-id: i-04f6a4021955bebb4 "}
{"level":"debug","ts":"2021-04-08T20:15:17.191Z","caller":"awsutils/awsutils.go:313","msg":"Found instance-type: c5.12xlarge "}
{"level":"debug","ts":"2021-04-08T20:15:17.192Z","caller":"awsutils/awsutils.go:313","msg":"Found primary interface's MAC address: 02:99:a5:52:66:02"}
{"level":"debug","ts":"2021-04-08T20:15:17.192Z","caller":"awsutils/awsutils.go:373","msg":"Discovered 2 interfaces."}
{"level":"debug","ts":"2021-04-08T20:15:17.193Z","caller":"awsutils/awsutils.go:373","msg":"Found device-number: 1 "}
{"level":"debug","ts":"2021-04-08T20:15:17.194Z","caller":"awsutils/awsutils.go:373","msg":"Found eni: eni-008f2e973d4be3be9 "}
{"level":"debug","ts":"2021-04-08T20:15:17.195Z","caller":"awsutils/awsutils.go:373","msg":"Found device-number: 0 "}
{"level":"debug","ts":"2021-04-08T20:15:17.195Z","caller":"awsutils/awsutils.go:373","msg":"Found eni: eni-06cfa14756d771eb5 "}
{"level":"debug","ts":"2021-04-08T20:15:17.195Z","caller":"awsutils/awsutils.go:373","msg":"eni-06cfa14756d771eb5 is the primary ENI of this instance"}
{"level":"debug","ts":"2021-04-08T20:15:17.196Z","caller":"awsutils/awsutils.go:313","msg":"Found subnet-id: subnet-068bf79668b2e8c29 "}
{"level":"info","ts":"2021-04-08T20:15:17.197Z","caller":"awsutils/awsutils.go:388","msg":"Found sg-003a220735ba852b1, added to ipamd cache"}
{"level":"info","ts":"2021-04-08T20:15:17.197Z","caller":"awsutils/awsutils.go:388","msg":"Found sg-0a509324a8a136303, added to ipamd cache"}
{"level":"debug","ts":"2021-04-08T20:15:17.197Z","caller":"awsutils/awsutils.go:442","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2021-04-08T20:15:17.197Z","caller":"awsutils/awsutils.go:569","msg":"Found ENI MAC address: 02:5f:aa:38:5e:86"}
{"level":"debug","ts":"2021-04-08T20:15:17.199Z","caller":"awsutils/awsutils.go:569","msg":"Found ENI: eni-008f2e973d4be3be9, MAC 02:5f:aa:38:5e:86, device 1"}
{"level":"debug","ts":"2021-04-08T20:15:17.199Z","caller":"awsutils/awsutils.go:589","msg":"Found CIDR 10.3.96.0/20 for ENI 02:5f:aa:38:5e:86"}
{"level":"debug","ts":"2021-04-08T20:15:17.200Z","caller":"awsutils/awsutils.go:589","msg":"Found IP addresses [10.3.111.153] on ENI 02:5f:aa:38:5e:86"}
{"level":"debug","ts":"2021-04-08T20:15:17.200Z","caller":"awsutils/awsutils.go:569","msg":"Found ENI MAC address: 02:99:a5:52:66:02"}
{"level":"debug","ts":"2021-04-08T20:15:17.202Z","caller":"awsutils/awsutils.go:583","msg":"Using device number 0 for primary ENI: eni-06cfa14756d771eb5"}
{"level":"debug","ts":"2021-04-08T20:15:17.202Z","caller":"awsutils/awsutils.go:569","msg":"Found ENI: eni-06cfa14756d771eb5, MAC 02:99:a5:52:66:02, device 0"}
{"level":"debug","ts":"2021-04-08T20:15:17.203Z","caller":"awsutils/awsutils.go:589","msg":"Found CIDR 10.3.96.0/20 for ENI 02:99:a5:52:66:02"}
{"level":"debug","ts":"2021-04-08T20:15:17.204Z","caller":"awsutils/awsutils.go:589","msg":"Found IP addresses [10.3.99.93 10.3.97.146 10.3.103.92 10.3.98.47 10.3.108.182 10.3.99.15 10.3.103.31 10.3.109.163 10.3.99.165 10.3.107.48 10.3.103.168 10.3.99.103 10.3.98.209 10.3.107.79 10.3.108.113 10.3.98.188 10.3.104.116 10.3.109.252 10.3.110.20 10.3.105.247 10.3.100.188 10.3.97.248 10.3.102.214 10.3.111.4 10.3.103.157 10.3.102.60] on ENI 02:99:a5:52:66:02"}
{"level":"debug","ts":"2021-04-08T20:15:17.204Z","caller":"awsutils/awsutils.go:388","msg":"Update ENI eni-008f2e973d4be3be9"}
{"level":"error","ts":"2021-04-08T20:15:17.411Z","caller":"aws-k8s-agent/main.go:28","msg":"Initialization failure: ipamd: can not initialize with AWS SDK interface: refreshSGIDs: unable to update the ENI's SG: InvalidNetworkInterfaceID.NotFound: The networkInterface ID 'eni-008f2e973d4be3be9' does not exist\n\tstatus code: 400, request id: 37c1197a-8c1a-4ed0-bcca-5b5504bffd16"}
Notice, the NotFound interface eni-008f2e973d4be3be9 is listed as device-number: 1. However, the primary eni for the instance is eni-06cfa14756d771eb5.
Attach logs
issue to early in pod start up to exec and capture
What you expected to happen:
the pod should restart
How to reproduce it (as minimally and precisely as possible):
restart the pod
Anything else we need to know?:
% kubectl -n kube-system describe po aws-node-shhgn
Name: aws-node-shhgn
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: ip-10-3-99-93.ca-central-1.compute.internal/10.3.99.93
Start Time: Thu, 08 Apr 2021 13:05:20 -0700
Labels: controller-revision-hash=dc9cc4bcd
k8s-app=aws-node
pod-template-generation=6
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.3.99.93
IPs:
IP: 10.3.99.93
Controlled By: DaemonSet/aws-node
Init Containers:
aws-vpc-cni-init:
Container ID: docker://a599b72e7b7e5b77c931cdef346b0dfdb86ac115562dc439e1e6e7c116cdf43b
Image: 602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1
Image ID: docker-pullable://602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni-init@sha256:d96d712513464de6ce94e422634a25546565418f20d1b28d3bce399d578f3296
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 08 Apr 2021 13:05:20 -0700
Finished: Thu, 08 Apr 2021 13:05:20 -0700
Ready: True
Restart Count: 0
Environment:
DISABLE_TCP_EARLY_DEMUX: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from aws-node-token-mgmnk (ro)
Containers:
aws-node:
Container ID: docker://5dc44c004100abb16c3a8d5f9b260401bac8b369e513bd08e692e093b753b60c
Image: 602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1
Image ID: docker-pullable://602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni@sha256:f310c918ee2b4ebced76d2d64a2ec128dde3b364d1b495f0ae73011f489d474d
Port: 61678/TCP
Host Port: 61678/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 08 Apr 2021 13:28:07 -0700
Finished: Thu, 08 Apr 2021 13:29:46 -0700
Ready: False
Restart Count: 9
Requests:
cpu: 10m
Liveness: exec [/app/grpc-health-probe -addr=:50051] delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/app/grpc-health-probe -addr=:50051] delay=1s timeout=1s period=10s #success=1 #failure=3
Environment:
ADDITIONAL_ENI_TAGS: {}
AWS_VPC_CNI_NODE_PORT_SUPPORT: true
AWS_VPC_ENI_MTU: 9001
AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER: false
AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: false
AWS_VPC_K8S_CNI_EXTERNALSNAT: false
AWS_VPC_K8S_CNI_LOGLEVEL: DEBUG
AWS_VPC_K8S_CNI_LOG_FILE: stdout
AWS_VPC_K8S_CNI_RANDOMIZESNAT: prng
AWS_VPC_K8S_CNI_VETHPREFIX: eni
AWS_VPC_K8S_PLUGIN_LOG_FILE: stderr
AWS_VPC_K8S_PLUGIN_LOG_LEVEL: DEBUG
DISABLE_INTROSPECTION: false
DISABLE_METRICS: false
ENABLE_POD_ENI: false
MY_NODE_NAME: (v1:spec.nodeName)
WARM_ENI_TARGET: 1
WARM_IP_TARGET: 10
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/host/var/log/aws-routed-eni from log-dir (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/aws-node from run-dir (rw)
/var/run/dockershim.sock from dockershim (rw)
/var/run/secrets/kubernetes.io/serviceaccount from aws-node-token-mgmnk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
dockershim:
Type: HostPath (bare host directory volume)
Path: /var/run/dockershim.sock
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType:
log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/aws-routed-eni
HostPathType: DirectoryOrCreate
run-dir:
Type: HostPath (bare host directory volume)
Path: /var/run/aws-node
HostPathType: DirectoryOrCreate
aws-node-token-mgmnk:
Type: Secret (a volume populated by a Secret)
SecretName: aws-node-token-mgmnk
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations:
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 29m kubelet Pulling image "602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1"
Normal Pulled 29m kubelet Successfully pulled image "602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni-init:v1.7.5-eksbuild.1"
Normal Created 29m kubelet Created container aws-vpc-cni-init
Normal Started 29m kubelet Started container aws-vpc-cni-init
Normal Scheduled 29m default-scheduler Successfully assigned kube-system/aws-node-shhgn to ip-10-3-99-93.ca-central-1.compute.internal
Warning Unhealthy 29m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:05:33.063Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 29m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:05:43.070Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:05:53.063Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:06:03.072Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:06:13.062Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:06:23.062Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Liveness probe failed: {"level":"info","ts":"2021-04-08T20:06:26.647Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Readiness probe failed: {"level":"info","ts":"2021-04-08T20:06:33.061Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning Unhealthy 28m kubelet Liveness probe failed: {"level":"info","ts":"2021-04-08T20:06:36.643Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Normal Killing 27m kubelet Container aws-node failed liveness probe, will be restarted
Normal Pulled 27m (x2 over 29m) kubelet Successfully pulled image "602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1"
Normal Pulling 27m (x2 over 29m) kubelet Pulling image "602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon-k8s-cni:v1.7.5-eksbuild.1"
Normal Created 27m (x2 over 29m) kubelet Created container aws-node
Normal Started 27m (x2 over 29m) kubelet Started container aws-node
Warning Unhealthy 14m (x86 over 28m) kubelet (combined from similar events): Readiness probe failed: {"level":"info","ts":"2021-04-08T20:20:23.063Z","caller":"/usr/local/go/src/runtime/proc.go:203","msg":"timeout: failed to connect service \":50051\" within 1s"}
Warning BackOff 4m23s (x43 over 17m) kubelet Back-off restarting failed container
Environment:
Kubernetes version (use kubectl version): v1.16.15-eks-ad4801
Yes, this seems to be because of stale ec2 meta data #1340. This PR #1341 is part of the fix, i.e, aws-node will not crash but will start logging the errors and also a count is added to compute the number of failures. We are working with ec2/metadata team for the sync issue.
What happened:
An update to the pod log-level was release, DEBUG to INFO for both:
On apply, the pod was stuck in
CrashLoopBackOff
. The change was reverted, restoring DEBUG level, but the pod remained stuck inCrashLoopBackOff
. The section asked me to describe what happened; its certainly not related to the change to the log level. I suspect the selected eni on start is incorrect?Notice, the
NotFound
interfaceeni-008f2e973d4be3be9
is listed asdevice-number: 1
. However, the primary eni for the instance iseni-06cfa14756d771eb5
.Attach logs
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): v1.16.15-eks-ad4801The text was updated successfully, but these errors were encountered: