Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run nginx container in peer pod #1450

Open
genjuro214 opened this issue Sep 20, 2023 · 13 comments · Fixed by #1502 or #1536
Open

Failed to run nginx container in peer pod #1450

genjuro214 opened this issue Sep 20, 2023 · 13 comments · Fixed by #1502 or #1536
Labels
bug Something isn't working

Comments

@genjuro214
Copy link
Contributor

It's not always reproduced.

When creating a peer pod with nginx container, sometimes the nginx may exit with error.

The nginx logs looks like below:

/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
dpkg-query: no packages found matching nginx
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Ignoring /docker-entrypoint.d/.wh..wh..opq
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/09/19 13:56:25 [emerg] 1#1: getpwnam("nginx") failed in /etc/nginx/nginx.conf:2
nginx: [emerg] getpwnam("nginx") failed in /etc/nginx/nginx.conf:2
@genjuro214 genjuro214 added the bug Something isn't working label Sep 20, 2023
@stevenhorsman
Copy link
Member

From NginxProxyManager/nginx-proxy-manager#398 is looks like there might be an issue with the nginx user not being added, but I'm not sure why that would only sometimes be hit.

Internally there has been a suggestion to switch out test pods to be busybox, rather than nginx, but I'm concerned that this is hiding the problem rather than solving it.

@sudharshanibm3
Copy link
Contributor

sudharshanibm3 commented Oct 5, 2023

Hi @stevenhorsman ,
Based on this nginx-issue, Nginx container starts before the container that creates the "nginx" user finishes. In some cases, the container may not have completed user creation by the time Nginx starts, leading to this [emerg] getpwnam("nginx") failed error.

So can we make a init container to enable "nginx" user exists before starting the Nginx service in order to avoid the instablity we are facing now ?

https://gist.github.com/sudharshanibm3/caba60a426b94a2d522e465382703903

Here, defined an init container named "useradd-init-container" that uses the "busybox" image to execute the command adduser -D nginx which creates the "nginx" user in the container with the -D flag to disable password assignment.

I also implemented the same initcontainer method in existing testcases which executes commands in nginx container - nginx-evidence

Tested the branch in jenkins job
https://sys-zaas-k8s-jenkins.swg-devops.com/job/cloud-api-adaptor-e2e-tests-opensource-sudharshan/217/console

So instead of switching all images to busybox, can we use this nginx with initcontainers?

sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 6, 2023
Added initcontianers along with nginx pods in order to add users manually

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
@stevenhorsman
Copy link
Member

Hey Sudharshan,

Thanks for looking into a work around for this. I'm a bit torn about it. On one hand it would be good to get the tests more stable, but I'm concerned that by adding the init-container approach we are just covering up an issue rather than resolving the root cause and a peer pods user wouldn't typically do that.

I'd also like to understand how common this problem is. I don't remember seeing it on the many test runs I've done, so is it a. newer issue, have I just got luck, or is the failure chance just ~10%. Do you know if we have any data for this?

@sudharshanibm3
Copy link
Contributor

sudharshanibm3 commented Oct 6, 2023

Hi @stevenhorsman & @mattarnoatibm ,

  • Here I got the evidence while running test cases without initcontainers along with logs - debug-nginx
  • I can able to reproduce the same error in nginx logs whenever the test for secrets gets failed
  • The failure happens among 1/10

Jenkins job: https://sys-zaas-k8s-jenkins.swg-devops.com/job/cloud-api-adaptor-e2e-tests-opensource-sudharshan/224/console

=== RUN   TestCreatePodWithSecret/SecretPeerPod_test#08
17:32:07      common_suite_test.go:198: Expected Pod State: Running
17:32:07      common_suite_test.go:199: Current Pod State: Running
17:32:07  === RUN   TestCreatePodWithSecret/SecretPeerPod_test#08/Secret_has_been_created_and_contains_data
17:32:11      common_suite_test.go:267: Log output of peer pod:/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
17:32:11          /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
17:32:11          /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
17:32:11          10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
17:32:11          dpkg-query: no packages found matching nginx
17:32:11          10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
17:32:11          /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
17:32:11          /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
17:32:11          /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
17:32:11          /docker-entrypoint.sh: Ignoring /docker-entrypoint.d/.wh..wh..opq
17:32:11          2023/10/06 12:02:08 [emerg] 1#1: getpwnam("nginx") failed in /etc/nginx/nginx.conf:2
17:32:11          nginx: [emerg] getpwnam("nginx") failed in /etc/nginx/nginx.conf:2
17:32:11          /docker-entrypoint.sh: Configuration complete; ready for start up
17:32:17      common_suite_test.go:281: 
17:32:17      common_suite_test.go:282: unable to upgrade connection: container not found ("nginx-secret-container")
17:32:17  time="2023-10-06T12:02:17Z" level=info msg="Deleting Secret... nginx-secret"
17:32:17  time="2023-10-06T12:02:17Z" level=info msg="Deleting pod nginx-secret-pod..."
17:32:22  time="2023-10-06T12:02:22Z" level=info msg="Pod nginx-secret-pod has been successfully deleted"
17:32:22  === RUN   TestCreatePodWithSecret/SecretPeerPod_test#09
17:32:54      common_suite_test.go:198: Expected Pod State: Running
17:32:54      common_suite_test.go:199: Current Pod State: Running
17:32:54  === RUN   TestCreatePodWithSecret/SecretPeerPod_test#09/Secret_has_been_created_and_contains_data
  • Based on this data it seems whenever the [emerg] getpwnam("nginx" error log occurs, our nginx container fails to execute commands inside it.

@mkulke
Copy link
Collaborator

mkulke commented Oct 6, 2023

Hey Sudharshan,

Thanks for looking into a work around for this. I'm a bit torn about it. On one hand it would be good to get the tests more stable, but I'm concerned that by adding the init-container approach we are just covering up an issue rather than resolving the root cause and a peer pods user wouldn't typically do that.

I'd also like to understand how common this problem is. I don't remember seeing it on the many test runs I've done, so is it a. newer issue, have I just got luck, or is the failure chance just ~10%. Do you know if we have any data for this?

I agree 100%. We're also seeing this issue and it's not just with nginx, I think it's a real bug that we want to fix. It should be easy to reproduce. if you start 10 peer pod replicas it'll look like this:

$ k get po -l app=nginx-caa
NAME                        READY   STATUS             RESTARTS      AGE
nginx-caa-8c8b67445-4ddp9   1/1     Running            0             3m16s
nginx-caa-8c8b67445-4f5ht   0/1     Error              4 (52s ago)   3m16s
nginx-caa-8c8b67445-7sjdp   0/1     Error              4 (54s ago)   3m16s
nginx-caa-8c8b67445-dqlf7   0/1     CrashLoopBackOff   3 (52s ago)   3m16s
nginx-caa-8c8b67445-fs5kp   0/1     CrashLoopBackOff   3 (33s ago)   3m16s
nginx-caa-8c8b67445-k54dt   1/1     Running            0             3m16s
nginx-caa-8c8b67445-k5n97   1/1     Running            0             3m16s
nginx-caa-8c8b67445-mrbvt   1/1     Running            0             3m16s
nginx-caa-8c8b67445-p77nk   1/1     Running            0             3m16s
nginx-caa-8c8b67445-q4skn   1/1     Running            0             3m16s

The failed pods won't recover automatically and always error out with the above log:

$ k logs nginx-caa-8c8b67445-7sjdp
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
dpkg-query: no packages found matching nginx
10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Ignoring /docker-entrypoint.d/.wh..wh..opq
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/10/06 14:56:44 [emerg] 1#1: getpwnam("nginx") failed in /etc/nginx/nginx.conf:2
nginx: [emerg] getpwnam("nginx") failed in /etc/nginx/nginx.conf:2

@mkulke
Copy link
Collaborator

mkulke commented Oct 6, 2023

for reference, the same deployment w/o kata runtimeclass:

k get po -l app=nginx
NAME                     READY   STATUS    RESTARTS   AGE
nginx-76d9fbf4fb-5wzch   1/1     Running   0          8s
nginx-76d9fbf4fb-7frmz   1/1     Running   0          8s
nginx-76d9fbf4fb-dkq7m   1/1     Running   0          9s
nginx-76d9fbf4fb-fb7v2   1/1     Running   0          8s
nginx-76d9fbf4fb-hv5tv   1/1     Running   0          8s
nginx-76d9fbf4fb-lfq79   1/1     Running   0          8s
nginx-76d9fbf4fb-m5t44   1/1     Running   0          8s
nginx-76d9fbf4fb-n6dw6   1/1     Running   0          8s
nginx-76d9fbf4fb-snpkh   1/1     Running   0          8s
nginx-76d9fbf4fb-zrvcd   1/1     Running   0          8s

@sudharshanibm3
Copy link
Contributor

sudharshanibm3 commented Oct 9, 2023

Thanks @mkulke ,

  • It seems sometimes Peer Pod which reaches Running state also getting the same logs as mentioned above and fails to execute commands inside container

Screenshot 2023-10-09 at 10 12 20 AM

sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 9, 2023
Added initcontianers along with nginx pods in order to add users manually

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
@mkulke
Copy link
Collaborator

mkulke commented Oct 9, 2023

@sudharshanibm3

I suspect that's due the nginx process being started which puts the the pod in the Running state briefly but the process crashes immediately and we cannot perform what we want to perform. It most likely also happens to the tests that do not perform secret or configmap tests, we just don't notice. we just wait for Running and the test passes.

One fix for the test which I can imagine: add a ReadynessProbe to the nginx pod definition, so that it won't report as running unless it listens on port 80. This will not make the tests pass (likely the reverse, more tests will fail) but the reporting will be accurate.

sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 9, 2023
Added initcontianers along with nginx pods in order to add users manually and readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
@sudharshanibm3
Copy link
Contributor

Hi @mkulke ,

  • Added ReadinessProbe to the pod Definition to listen on Port 80 - code changes
=== RUN   TestCreatePodWithSecret
=== RUN   TestCreatePodWithSecret/SecretPeerPod_test
    common_suite_test.go:198: Expected Pod State: Running
    common_suite_test.go:199: Current Pod State: Running
===================
Checking Readiness Probe....
===================
*.Initial Delay Seconds: 10
*.Timeout Seconds: 1
*.Success Threshold: 1
*.Failure Threshold: 3
*.Period Seconds: 5
*.Probe Handler: {nil &HTTPGetAction{Path:/,Port:{0 80 },Host:,Scheme:HTTP,HTTPHeaders:[]HTTPHeader{},} nil nil}
*.Probe Handler Port: {0 80 }
===================
=== RUN   TestCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data
time="2023-10-09T13:53:43+05:30" level=info msg="Username from secret inside pod: admin"
time="2023-10-09T13:53:49+05:30" level=info msg="Password from secret inside pod: password"
time="2023-10-09T13:53:49+05:30" level=info msg="PodVM name: nginx-secret-pod"
time="2023-10-09T13:53:50+05:30" level=debug msg="Instance number: 0, Instance id: 0787_411f5774-d70e-484b-8cf6-b2bb4a62138c, Instance name: podvm-nginx-secret-pod-b751754d"
time="2023-10-09T13:53:50+05:30" level=info msg="Deleting Secret... nginx-secret"
time="2023-10-09T13:53:50+05:30" level=info msg="Deleting pod nginx-secret-pod..."
time="2023-10-09T13:53:55+05:30" level=info msg="Pod nginx-secret-pod has been successfully deleted"
--- PASS: TestCreatePodWithSecret (54.33s)
    --- PASS: TestCreatePodWithSecret/SecretPeerPod_test (54.33s)
        --- PASS: TestCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data (13.07s)

@mkulke
Copy link
Collaborator

mkulke commented Oct 9, 2023

@sudharshanibm3 thanks! I would strongly recommend to not add the Init container in the PR, though. An init container would maybe make the tests pass, but AFAICT it's obscuring a problem that we have. We cannot expect users to add fixes like this to their workloads.

sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 9, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
@sudharshanibm3
Copy link
Contributor

sudharshanibm3 commented Oct 9, 2023

Hi @mkulke ,
Updated following codes changes in PR

  • Removed InitContainer from test cases executing commands inside container (But kept he common method, which can be used in future)
  • Included More Pod Condition status and Readiness probe status
=== RUN   TestCreatePodWithSecret
=== RUN   TestCreatePodWithSecret/SecretPeerPod_test
    common_suite_test.go:199: Expected Pod State: Running
    common_suite_test.go:200: Current Pod State: Running
===================
Checking Readiness Conditon - 1....
===================
*.Condition Type: Initialized
*.Condition Status: True
*.Condition Last Probe Time: 0001-01-01 00:00:00 +0000 UTC
*.Condition Last Transition Time: 2023-10-09 15:04:14 +0530 IST
*.Condition Last Message: 
*.Condition Last Reason: 
===================
Checking Readiness Conditon - 2....
===================
*.Condition Type: Ready
*.Condition Status: False
*.Condition Last Probe Time: 0001-01-01 00:00:00 +0000 UTC
*.Condition Last Transition Time: 2023-10-09 15:04:14 +0530 IST
*.Condition Last Message: containers with unready status: [nginx-secret-container]
*.Condition Last Reason: ContainersNotReady
===================
Checking Readiness Conditon - 3....
===================
*.Condition Type: ContainersReady
*.Condition Status: False
*.Condition Last Probe Time: 0001-01-01 00:00:00 +0000 UTC
*.Condition Last Transition Time: 2023-10-09 15:04:14 +0530 IST
*.Condition Last Message: containers with unready status: [nginx-secret-container]
*.Condition Last Reason: ContainersNotReady
===================
Checking Readiness Conditon - 4....
===================
*.Condition Type: PodScheduled
*.Condition Status: True
*.Condition Last Probe Time: 0001-01-01 00:00:00 +0000 UTC
*.Condition Last Transition Time: 2023-10-09 15:04:14 +0530 IST
*.Condition Last Message: 
*.Condition Last Reason: 
===================
Checking Readiness Probe....
===================
*.Initial Delay Seconds: 10
*.Timeout Seconds: 1
*.Success Threshold: 1
*.Failure Threshold: 3
*.Period Seconds: 5
*.Probe Handler: {nil &HTTPGetAction{Path:/,Port:{0 80 },Host:,Scheme:HTTP,HTTPHeaders:[]HTTPHeader{},} nil nil}
*.Probe Handler Port: {0 80 }
===================
=== RUN   TestCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data
time="2023-10-09T15:05:06+05:30" level=info msg="Username from secret inside pod: admin"
time="2023-10-09T15:05:12+05:30" level=info msg="Password from secret inside pod: password"
time="2023-10-09T15:05:12+05:30" level=info msg="PodVM name: nginx-secret-pod"
time="2023-10-09T15:05:14+05:30" level=debug msg="Instance number: 0, Instance id: 0787_54c13478-1137-4a62-8520-d0d114596f2f, Instance name: podvm-nginx-secret-pod-fb82e684"
time="2023-10-09T15:05:14+05:30" level=info msg="Deleting Secret... nginx-secret"
time="2023-10-09T15:05:14+05:30" level=info msg="Deleting pod nginx-secret-pod..."
time="2023-10-09T15:05:19+05:30" level=info msg="Pod nginx-secret-pod has been successfully deleted"
--- PASS: TestCreatePodWithSecret (65.25s)
    --- PASS: TestCreatePodWithSecret/SecretPeerPod_test (65.25s)
        --- PASS: TestCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data (14.01s)

sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 9, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 9, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 11, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 15, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 17, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 20, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 20, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 20, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
stevenhorsman pushed a commit that referenced this issue Oct 20, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  #1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 23, 2023
Added excluded debugging code changes

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 25, 2023
Added excluded debugging code changes

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Oct 27, 2023
Added excluded debugging code changes

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Nov 3, 2023
Added excluded debugging code changes

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Nov 3, 2023
Added excluded code snippets while rebasing the PR confidential-containers#1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
sudharshanibm3 added a commit to sudharshanibm3/cloud-api-adaptor that referenced this issue Nov 6, 2023
Added excluded code snippets while rebasing the PR confidential-containers#1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
liudalibj pushed a commit that referenced this issue Nov 8, 2023
Added excluded code snippets while rebasing the PR #1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run

Fixes: #1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
lysliu pushed a commit to lysliu/cloud-api-adaptor that referenced this issue Nov 9, 2023
Added Containerport along with statements for debugging readiness probe

Fixes:  confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
lysliu pushed a commit to lysliu/cloud-api-adaptor that referenced this issue Nov 9, 2023
Added excluded code snippets while rebasing the PR confidential-containers#1502. Enhanced Debugging for Readiness Probe Status in Nightly Jenkins Run

Fixes: confidential-containers#1450

Signed-off-by: Sudharshan Muralidharan <sudharshan.muralidharan@ibm.com>
@liudalibj liudalibj reopened this Dec 6, 2023
@liudalibj
Copy link
Member

liudalibj commented Dec 6, 2023

We still can see the nginx: [emerg] getpwnam("nginx") failed in /etc/nginx/nginx.conf:2 issue on our e2e-test piplines, it's not always reproduced but seems to meet this issue ~10%

@liudalibj
Copy link
Member

the nignx pod is running status but the container is CrashLoopBackOff status.
One example log from my debug job:

15:33:56      assessment_runner_test.go:207: Expected Pod State: Running
15:33:56      assessment_runner_test.go:208: Current Pod State: Running
15:33:56      assessment_runner_test.go:209: Current Pod Container State: [{nginx {&ContainerStateWaiting{Reason:CrashLoopBackOff,Message:back-off 5m0s restarting failed container=nginx pod=deletion-test_default(e20ad7ac-52bd-4f5f-afdd-093b243d217c),} nil nil} {nil nil &ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2023-12-06 07:29:24 +0000 UTC,FinishedAt:2023-12-06 07:29:24 +0000 UTC,ContainerID:containerd://69b186873d80f19e794952d496ebfafae819ac67a030fac497fb4d72de31ab22,}} false 7 docker.io/library/nginx:latest docker.io/library/nginx@sha256:10d1f5b58f74683ad34eb29287e07dab1e90f10af243f151bb50aa5dbb4d62ee containerd://69b186873d80f19e794952d496ebfafae819ac67a030fac497fb4d72de31ab22 0xc000b20235}]

liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 7, 2023
- sometime nginx container not work well in te2e test, make the test cases fail
- use busybox to reduce this unexpected fails
- add a new nginx deployment test to tract the nginx image issue

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 7, 2023
- sometime nginx container not work well in te2e test, make the test cases fail
- use busybox to reduce this unexpected fails
- add a new nginx deployment test to tract the nginx image issue

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 7, 2023
- sometime nginx container not work well in te2e test, make the test cases fail
- use busybox to reduce this unexpected fails
- add a new nginx deployment test to tract the nginx image issue

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 7, 2023
- sometime nginx container not work well in te2e test, make the test cases fail
- use busybox to reduce this unexpected fails
- add a new nginx deployment test to tract the nginx image issue

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 20, 2023
- sometime nginx container not work well in te2e test, make the test cases fail
- use busybox to reduce this unexpected fails
- add a new nginx deployment test to tract the nginx image issue

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 22, 2023
- use busybox to reduce test execution time
- add a new nginx deployment test
- reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 22, 2023
- use busybox to reduce test execution time
- add a new nginx deployment test
- reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 22, 2023
- change to check container status for pod with running and testcommands.
- use busybox to reduce test execution time
- add a new nginx deployment test
- reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit to liudalibj/cloud-api-adaptor that referenced this issue Dec 22, 2023
- change to check container status for pod with running and testcommands.
- use busybox to reduce test execution time
- reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600

fixes for confidential-containers#1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
liudalibj pushed a commit that referenced this issue Dec 22, 2023
- change to check container status for pod with running and testcommands.
- use busybox to reduce test execution time
- reduce WAIT_POD_RUNNING_TIMEOUT from 900 to 600

fixes for #1450

Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants