-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-node is restarting (Crashing, exiting on 137) sporadically which causes all pods on that node to stuck on ContainerCreating state. #1425
Comments
Hi @daganida88 Exit code 137 is OOM (out of memory), so does it happen on startup or after several hours of aws-node running. Can you also please check if any other process is consuming high memory, since memory allotted to the Docker containers is limited by the total available memory in the host machine and when usage increases, the available free memory may be insufficient for all the containers and hence containers may crash. |
Hi @daganida88 Can you please confirm if any other process is consuming high memory? |
Hi @daganida88 If you could share kubelet logs when the issue happens, it would help us debug. Thanks! |
Please address this issue, we're experiencing the same error (it is sporadical) and there is no way we can reproduce that. |
I understand this issue happens sporadically but if you could share the information on CNI version, node and pod scale, if there is any churn with pod scale then we can try to repro the issue. Also next time when you see the issue if you could run Thanks! |
@jayanthvn cni version is 1.7.5, nodes is controlled by autoscaler as recommended by eks documentation (node type is m5.large) I'll have a look on monitoring aws-node memory consumption to check if that is the case. Cheers! |
I spoke to our AWS support rep regarding this issue and he seemed to
believe that this was expected and due to the CNI container starting before
the kube-proxy container became healthy.
Just my 2 cents.
…On Tue, Jun 15, 2021 at 4:33 PM Truc Nguyen Lam ***@***.***> wrote:
@jayanthvn <https://github.com/jayanthvn> cni version is 1.7.5, nodes is
controlled by autoscaler as recommended by eks documentation (node type is
m5.large)
Unfortunately, our nodes don't have ssm installed by default so we can't
log in to run the command.
I'll have a look on monitoring aws-node memory consumption to check if
that is the case.
Cheers!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIB7WSY44YL6VSZHBU26NV3TS7BJBANCNFSM423IKGDA>
.
|
I'm still searching for an effective automatic approach to handle this issue, currently the only way to fix it is to manually terminate the node that is stuck. |
We upgraded our cluster from 1.19 to 1.20 and upgraded the unmanaged vpc-cni addon to vpc-cni 1.8.0 managed EKS Addon. Now we also experience this Issue. Somethime the containers also gets killed by its failing health and readiness probes.
The node is a The restart at 10 AM was triggered by a failed health probe and the restart at 11 AM was triggered by 137 exit code. In our cluster we use resource requests and limits and the Note: The node was initially started with a different cni-plugin version, because of the Kubernetes version upgrade |
The strange thing is, that the health probes fail without a reason message after |
For a month, after upgrading 4 of our EKS clusters from 1.17 to 1.20 and moving to aws-managed EKS node groups, we see appearing Readiness and Liveness failures for The error does not occur at each deployment update, only sporadically. In the new cluster, I use the default addon, I have not upgraded them.
I also think it is strange to not have the reason message after Regards |
I changed the I also found that, all previous container restarts can be related to very high CPU utilization of the node (100%) over 30 seconds duration or longer. This is enough time for three health probe timeouts which trigger a container restart. I think because the pod only has a cpu resource request of 10m it does not get enough cpu time to respond to the health probe in under 1 second. To guarantee such low responds times the pod requires more guaranteed cpu time. However, such a low response time is not required all the time and restarting the pod does not resolve slow response times when the node has very high cpu utilization. So increasing the health probe timeout is the only solution for this problem. |
@jayanthvn We have the same issue. We updated from 1.19 to 1.20 and also using CNI addons v1.8.0. aws-node keeps on restarting when cluster-autoscaler scaled up. I have tried to increase timeout for liveness probe but does not help. |
Sorry for the delayed response. @Legion2 - Thanks for letting us know, I will look into it. @sarbajitdutta - You mean on the new nodes the aws-node keeps restarting? Can you please share me the log lines from |
We are also encountering these random livenessProbe failures after upgrading to EKS 1.21. Increasing the timeout helps with the restarts. The reason why this has started to happen now is a bug fix in kubernetes 1.20: Fixing Kubelet Exec Probe Timeouts and KEP-1972: kubelet exec probe timeouts. Previously the liveness and readiness probes did not respect the timeout for exec probes and the probe ran indefinitely. Now the default timeout of 1s is respected, but is sometimes too short causing it to fail and pods to restart. |
Thanks @mikkotimoharju for sharing this. We will look into it on adjusting the timeouts. |
Hi, There are 2 issues mentioned here -
Thanks. |
In an ideal scenario, default timeoutSeconds value (1) should be good enough as long as the aws-node pod is not starved of CPU resources. So, even with exec probe timeout being enforced from 1.20+ we should be Ok and these values can be adjusted based on the use case (i.e.,) if you expect higher cpu load on the system, CPU resource req and timeout values can be adjusted accordingly. We can document this. |
@jayanthvn thanks, case opened (8817394681) From logs looks like the container can't communicate with the cluster at boot, and therefore fail healthchecks:
|
Thanks @laupow Can you please check kube-proxy pod logs and confirm if you are seeing logs similar to below -
|
Unfortunately I don't see those logs on any nodes with crashed CNI pods. I only see |
Hi, |
Any update on this issue? We see the same problem. Will updating to 1.21 fix it? |
@achevuru, the default liveness probe timeout of 1s is not enough on any cluster that runs burstable workloads allowed to scale up beyond the resources of a node. Maybe the default timeout value should be adjusted? It is also possible that bumping up resources.requests.cpu to more than 10m cores will be sufficient to allow the probe not to be starved of CPU resources and to always finish within the default 1 second. |
@laupow - sure we will check in the ticket if kube-proxy is taking time to start leading to aws-node to restart. @deva1987m and @vioan - Sorry this thread is deviated into 2 issue as mentioned above. Are you referring to OOM killed (137) of aws-node pods? If so can you please check CPU utilization on the instance. Also I feel we need to set cpu and Memory limits for aws-node. |
@jayanthvn When cluster-autoscaler scales up Readiness probe fails for aws-node. Its stuck here -
The successful ones have this log messages -
Error message - |
Hello, any updates on this issue? |
@crechandan for us upgrading cni worked like a charm. aws recommended cni version for k8s v1.20 is 1.10.1 (see table here). we do not use addon so we basically apply yaml from the official cni repo: current release aws-k8s-cni.yaml |
@dkrzyszczyk Yes, when we are using aws-cni deployment via link >> https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html via kubectl apply command. This is working for me as well as correct version for aws cni is getting picked. But issue is only when we are trying to deploy aws-cni eks-addon using CloudFormation >> eks addon feature:
Installing aws-cni via cloudformation seems picking old versions (which seems not compatible). We are just trying to simplify our EKS addon deployment by using this CloudFormation EKS add on feature. When I am deploying kube-proxy & core-dns addon via Cloudformation : its working perfectly fine. Any idea, when this issue regarding aws-cni shall be resolved ? |
@crechandan - The issues you are seeing is when you are using EKS Managed Add-on the CNI version installed is 1.7.5 and with 1.7.5 is aws-node never starting or randomly you are seeing liveness/readiness probes failures? I see you have the same question here - #1930 (comment) For installing 1.10.1 EKS Managed Add-on did you try giving the
|
We are facing the same issue with EKS 1.21 and vpc-cni 1.10.1.eksbuild and 1.10.2.eksbuild. The issue appears on nodes with high CPU and/or Memory utilization. We solved it as follows:
|
Yes, I can see aws cni is failing with livliness/readiness probes failure and never starting. I tried below Cloud Formation with version this time for aws cni eks addon:
Still it is not working. Logs for aws cni pod mentined below:
All pods on EKS cluster is waiting for vpc cni to be up and & running (all are struck in pending state) Kube Proxy Log:
Even when I patched kube-proxy as mentioned in above thread >> error "1 node.go:161] Failed to retrieve node info: nodes "ip*" not found" is not coming but there is no effect on vpc-cni >> stil probes failing.
|
I'm seeing the same issues but I've noticed that it only occurs if I explicitly set the |
@rpf3 Thanks a lot - When I used the same cloud formation without mentioning the service-account-role-arn flag : it worked perfectly. Implemented Kube-proxy, Core-dns & aws-cni in same way and all are working with cloudformation EKS addon feature. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
/not stale |
Just upgraded to EKS 1.21 and updated the aws-node VPC CNI to |
@ido-vcita Please check resource utilization (on the nodes) and if required adjust the timeout values for your liveness/readiness probes. With 1.21 exec probe timeouts are honored - kubernetes/enhancements#1972 |
I've had all recommended settings applied as well and was still seeing this behavior if the node was running hot with workloads. Bumping/setting CPU/memory requests I think helped a bit more but errors still seem to crop up (we have 1000s of nodes we scale in and out daily). |
Our setup:
When the custom Role is provided, When the custom Role is disabled, all pods start almost immediately (2 retries in the logs). The Trust Relationship for that Role looks very equal to the one generated by {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::YYYYYY-provider/oidc.eks.us-east-1.amazonaws.com/id/XXXXXXXX"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/XXXXXXXX:aud": "sts.amazonaws.com",
"oidc.eks.us-east-1.amazonaws.com/id/XXXXXXXX:sub": "system:serviceaccount:kube-system:aws-node"
}
}
}
]
} |
@KIVagant I believe CNI is running in to permission issues - maybe not able to assume the custom role. Can you check the CNI/IPAMD logs? You should see 403s in the logs.. |
Thank you, @achevuru . Today I learned how to troubleshoot the CNI. It turned out the Role ARN was incorrectly configured. 🤦 The line has a mistake. Instead of this:
it must be this (added
|
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
I've had the same problem these two weeks, has someone found a solution? |
ipamd log: {"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.43.0"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.43.0/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.60.1"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.60.1/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.47.2"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.47.2/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.46.131"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.46.131/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.61.196"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.61.196/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.49.6"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.49.6/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.41.135"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.41.135/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.38.218"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.38.218/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.39.157"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.39.157/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1418","msg":"Trying to add 192.168.59.213"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"Adding 192.168.59.213/32 to DS for eni-00023922abf62516c"}
{"level":"info","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1542","msg":"IP already in DS"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:653","msg":"Reconcile existing ENI eni-00023922abf62516c IP prefixes"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1351","msg":"Found prefix pool count 0 for eni eni-00023922abf62516c\n"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:653","msg":"Successfully Reconciled ENI/IP pool"}
{"level":"debug","ts":"2022-10-03T15:42:25.909Z","caller":"ipamd/ipamd.go:1396","msg":"IP pool stats: Total IPs/Prefixes = 87/0, AssignedIPs/CooldownIPs: 31/0, c.maxIPsPerENI = 29"}
command terminated with exit code 137 aws-node log: # kubectl logs -f aws-node-zdp6x --tail 30 -n kube-system
{"level":"info","ts":"2022-10-02T14:56:07.820Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-10-02T14:56:07.821Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-10-02T14:56:07.833Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-10-02T14:56:07.834Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-10-02T14:56:09.841Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:11.847Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:13.853Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:15.860Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:17.866Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:19.872Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:21.878Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:23.884Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:25.890Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:27.897Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:29.903Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:31.909Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:33.916Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:35.922Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:37.928Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:39.934Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:41.940Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:43.947Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:45.953Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:47.959Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-10-02T14:56:49.966Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"} I used cluster-autoscaler for auto-scaling, k8s version is 1.22, also following the troubleshooting guide https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md#known-issues and applying the suggestion Interestingly, this failure usually only occurs on a certain node, and when I terminate the instance of that node and make it automatically expand again, it starts working. But after running for a while, it will restart again. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
Issue closed due to inactivity. |
What happened:
aws-node is restarting (Crashing, exiting on 137) sporadically which causes all pods on that node to stuck on ContainerCreating state.
Attach logs
aws-node {"level":"info","ts":"2021-04-13T12:24:31.015Z","caller":"entrypoint.sh","msg":"Install CNI binary.."} │
│ aws-node {"level":"info","ts":"2021-04-13T12:24:31.031Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "} │
│ aws-node {"level":"info","ts":"2021-04-13T12:24:31.032Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
3:31
aws-vpc-cni-init + PLUGIN_BINS='loopback portmap bandwidth aws-cni-support.sh' │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + '[' '!' -f loopback ']' │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + '[' '!' -f portmap ']' │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + '[' '!' -f bandwidth ']' │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + '[' '!' -f aws-cni-support.sh ']' │
│ aws-vpc-cni-init Copying CNI plugin binaries ... │
│ aws-vpc-cni-init + HOST_CNI_BIN_PATH=/host/opt/cni/bin │
│ aws-vpc-cni-init + echo 'Copying CNI plugin binaries ... ' │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + install loopback /host/opt/cni/bin │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + install portmap /host/opt/cni/bin │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + install bandwidth /host/opt/cni/bin │
│ aws-vpc-cni-init + for b in '$PLUGIN_BINS' │
│ aws-vpc-cni-init + install aws-cni-support.sh /host/opt/cni/bin │
│ aws-vpc-cni-init + echo 'Configure rp_filter loose... ' │
│ aws-vpc-cni-init Configure rp_filter loose... │
│ aws-vpc-cni-init ++ curl -X PUT http://169.254.169.254/latest/api/token -H 'X-aws-ec2-metadata-token-ttl-seconds: 60' │
│ aws-vpc-cni-init % Total % Received % Xferd Average Speed Time Time Time Current │
│ aws-vpc-cni-init Dload Upload Total Spent Left Speed │
│ aws-vpc-cni-init 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 56 100 56 0 0 56000 0 --:--:-- --:--:-- --:--:-- 56000 │
│ aws-vpc-cni-init + TOKEN=AQAEABqkV8qO_waLfVT_6TITDHDBvmAy3jkblGe9YXSpR-irRxvwJQ== │
│ aws-vpc-cni-init ++ curl -H 'X-aws-ec2-metadata-token: AQAEABqkV8qO_waLfVT_6TITDHDBvmAy3jkblGe9YXSpR-irRxvwJQ==' http://169.254.169.254/latest/meta-data/local-ipv4 │
│ aws-vpc-cni-init % Total % Received % Xferd Average Speed Time Time Time Current │
│ aws-vpc-cni-init Dload Upload Total Spent Left Speed │
│ aws-vpc-cni-init 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 13 100 13 0 0 13000 0 --:--:-- --:--:-- --:--:-- 13000 │
│ aws-vpc-cni-init + HOST_IP=172.19.224.26 │
│ aws-vpc-cni-init ++ ip -4 -o a │
│ aws-vpc-cni-init ++ awk '{print $2}' │
│ aws-vpc-cni-init ++ grep 172.19.224.26/ │
│ aws-vpc-cni-init + PRIMARY_IF=eth0 │
│ aws-vpc-cni-init + sysctl -w net.ipv4.conf.eth0.rp_filter=2 │
│ aws-vpc-cni-init net.ipv4.conf.eth0.rp_filter = 2 │
│ aws-vpc-cni-init + cat /proc/sys/net/ipv4/conf/eth0/rp_filter │
│ aws-vpc-cni-init 2 │
│ aws-vpc-cni-init + '[' false == true ']' │
│ aws-vpc-cni-init + sysctl -e -w net.ipv4.tcp_early_demux=1 │
│ aws-vpc-cni-init net.ipv4.tcp_early_demux = 1 │
│ aws-vpc-cni-init CNI init container done │
│ aws-vpc-cni-init + echo 'CNI init container done'
What you expected to happen:
I expected the pod not to crush
How to reproduce it (as minimally and precisely as possible):
Sometimes it happens sometimes it's not. but when it happens, until we don't kill the pod, the node will stay on zombie state where all pods stuck on ContainerCreating
Anything else we need to know?:
Running on EKS 1.16.15
Linux
Cni version: 1.7.5
The text was updated successfully, but these errors were encountered: