-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to run nginx container in peer pod #1450
Comments
From NginxProxyManager/nginx-proxy-manager#398 is looks like there might be an issue with the nginx user not being added, but I'm not sure why that would only sometimes be hit. Internally there has been a suggestion to switch out test pods to be busybox, rather than nginx, but I'm concerned that this is hiding the problem rather than solving it. |
Hi @stevenhorsman , So can we make a init container to enable "nginx" user exists before starting the Nginx service in order to avoid the instablity we are facing now ? https://gist.github.com/sudharshanibm3/caba60a426b94a2d522e465382703903 Here, defined an init container named "useradd-init-container" that uses the "busybox" image to execute the command I also implemented the same initcontainer method in existing testcases which executes commands in nginx container - nginx-evidence Tested the branch in jenkins job So instead of switching all images to busybox, can we use this nginx with initcontainers? |
Added initcontianers along with nginx pods in order to add users manually Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Hey Sudharshan, Thanks for looking into a work around for this. I'm a bit torn about it. On one hand it would be good to get the tests more stable, but I'm concerned that by adding the init-container approach we are just covering up an issue rather than resolving the root cause and a peer pods user wouldn't typically do that. I'd also like to understand how common this problem is. I don't remember seeing it on the many test runs I've done, so is it a. newer issue, have I just got luck, or is the failure chance just ~10%. Do you know if we have any data for this? |
Hi @stevenhorsman & @mattarnoatibm ,
Jenkins job: https://sys-zaas-k8s-jenkins.swg-devops.com/job/cloud-api-adaptor-e2e-tests-opensource-sudharshan/224/console
|
I agree 100%. We're also seeing this issue and it's not just with nginx, I think it's a real bug that we want to fix. It should be easy to reproduce. if you start 10 peer pod replicas it'll look like this: $ k get po -l app=nginx-caa
NAME READY STATUS RESTARTS AGE
nginx-caa-8c8b67445-4ddp9 1/1 Running 0 3m16s
nginx-caa-8c8b67445-4f5ht 0/1 Error 4 (52s ago) 3m16s
nginx-caa-8c8b67445-7sjdp 0/1 Error 4 (54s ago) 3m16s
nginx-caa-8c8b67445-dqlf7 0/1 CrashLoopBackOff 3 (52s ago) 3m16s
nginx-caa-8c8b67445-fs5kp 0/1 CrashLoopBackOff 3 (33s ago) 3m16s
nginx-caa-8c8b67445-k54dt 1/1 Running 0 3m16s
nginx-caa-8c8b67445-k5n97 1/1 Running 0 3m16s
nginx-caa-8c8b67445-mrbvt 1/1 Running 0 3m16s
nginx-caa-8c8b67445-p77nk 1/1 Running 0 3m16s
nginx-caa-8c8b67445-q4skn 1/1 Running 0 3m16s The failed pods won't recover automatically and always error out with the above log: $ k logs nginx-caa-8c8b67445-7sjdp
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
dpkg-query: no packages found matching nginx
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Ignoring /docker-entrypoint.d/.wh..wh..opq
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/10/06 14:56:44 [emerg] 1#1: getpwnam("nginx") failed in /etc/nginx/nginx.conf:2
nginx: [emerg] getpwnam("nginx") failed in /etc/nginx/nginx.conf:2 |
for reference, the same deployment w/o kata runtimeclass: k get po -l app=nginx
NAME READY STATUS RESTARTS AGE
nginx-76d9fbf4fb-5wzch 1/1 Running 0 8s
nginx-76d9fbf4fb-7frmz 1/1 Running 0 8s
nginx-76d9fbf4fb-dkq7m 1/1 Running 0 9s
nginx-76d9fbf4fb-fb7v2 1/1 Running 0 8s
nginx-76d9fbf4fb-hv5tv 1/1 Running 0 8s
nginx-76d9fbf4fb-lfq79 1/1 Running 0 8s
nginx-76d9fbf4fb-m5t44 1/1 Running 0 8s
nginx-76d9fbf4fb-n6dw6 1/1 Running 0 8s
nginx-76d9fbf4fb-snpkh 1/1 Running 0 8s
nginx-76d9fbf4fb-zrvcd 1/1 Running 0 8s |
Thanks @mkulke ,
|
Added initcontianers along with nginx pods in order to add users manually Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
I suspect that's due the nginx process being started which puts the the pod in the One fix for the test which I can imagine: add a |
Added initcontianers along with nginx pods in order to add users manually and readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Hi @mkulke ,
|
@sudharshanibm3 thanks! I would strongly recommend to not add the Init container in the PR, though. An init container would maybe make the tests pass, but AFAICT it's obscuring a problem that we have. We cannot expect users to add fixes like this to their workloads. |
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Hi @mkulke ,
|
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: #1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded debugging code changes Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded debugging code changes Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded debugging code changes Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded debugging code changes Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded code snippets while rebasing the PR confidential-containers#1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded code snippets while rebasing the PR confidential-containers#1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added Containerport along with statements for debugging readiness probe Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
Added excluded code snippets while rebasing the PR confidential-containers#1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run Fixes: confidential-containers#1450 Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
We still can see the |
the nignx pod is running status but the container is
|
- sometime nginx container not work well in te2e test, make the test cases fail - use busybox to reduce this unexpected fails - add a new nginx deployment test to tract the nginx image issue fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- sometime nginx container not work well in te2e test, make the test cases fail - use busybox to reduce this unexpected fails - add a new nginx deployment test to tract the nginx image issue fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- sometime nginx container not work well in te2e test, make the test cases fail - use busybox to reduce this unexpected fails - add a new nginx deployment test to tract the nginx image issue fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- sometime nginx container not work well in te2e test, make the test cases fail - use busybox to reduce this unexpected fails - add a new nginx deployment test to tract the nginx image issue fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- sometime nginx container not work well in te2e test, make the test cases fail - use busybox to reduce this unexpected fails - add a new nginx deployment test to tract the nginx image issue fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- use busybox to reduce test execution time - add a new nginx deployment test - reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600 fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- use busybox to reduce test execution time - add a new nginx deployment test - reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600 fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- change to check container status for pod with running and testcommands. - use busybox to reduce test execution time - add a new nginx deployment test - reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600 fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- change to check container status for pod with running and testcommands. - use busybox to reduce test execution time - reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600 fixes for confidential-containers#1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
- change to check container status for pod with running and testcommands. - use busybox to reduce test execution time - reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600 fixes for #1450 Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
It's not always reproduced.
When creating a peer pod with nginx container, sometimes the nginx may exit with error.
The nginx logs looks like below:
The text was updated successfully, but these errors were encountered: