[🐛 Bug]: Chart 0.36.0 Distribiutor keeps restarting #2407

piotrlaczykowski · 2024-09-23T13:54:39Z

What happened?

After update do Chart 0.36.0 and selenium/distributor:4.25.0-20240922 pod keep restarting
(combined from similar events): Liveness probe failed: 13:49:47.762 DEBUG [Probe.Liveness] - Session Queue Size: 49, Session Count: 0, Max Session: 0 13:49:47.763 DEBUG [Probe.Liveness] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed.

Command used to start Selenium Grid with Docker (or Kubernetes)

Relevant log output

Operating System

Kubernetes

Docker Selenium version (image tag)

4.25.0-20240922

Selenium Grid chart version (chart version)

0.36.0

github-actions · 2024-09-23T13:54:51Z

@piotrlaczykowski, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

VietND96 · 2024-09-23T13:59:30Z

I believe it will be fixed by #2408
Root cause: Basic auth is removed from GraphQL endpoint, which is not support in current stable KEDA core. Scaler could not scaling Nodes due to 401
Once chart 0.36.1 is out. basicAuth.embeddedUrl: true will fix this

piotrlaczykowski · 2024-09-24T06:56:11Z

I believe it will be fixed by #2408 Root cause: Basic auth is removed from GraphQL endpoint, which is not support in current stable KEDA core. Scaler could not scaling Nodes due to 401 Once chart 0.36.1 is out. basicAuth.embeddedUrl: true will fix this

It didn't helped for us

VietND96 · 2024-09-24T07:00:50Z

kubectl logs for pod keda-operator, any error could be seen there?

VietND96 · 2024-09-24T07:03:40Z

Also, did you enable SE_REJECT_UNSUPPORTED_CAPS in hub/router? In case autoscaling with min replicas=0, this should not be enabled.

piotrlaczykowski · 2024-09-24T07:11:53Z

keda-operator.log

Also, did you enable SE_REJECT_UNSUPPORTED_CAPS in hub/router? In case autoscaling with min replicas=0, this should not be enabled.

I have minimum replicas 0 and I don't have this SE_REJECT_UNSUPPORTED_CAPS.

So what should I do?

VietND96 · 2024-09-24T07:20:46Z

I saw node Edge scaled and run properly, but node Chrome keep 0 for a long time in logs

2024-09-23T09:46:22Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of running Jobs": 24}
2024-09-23T09:46:22Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of pending Jobs ": 1}
2024-09-23T09:46:22Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Effective number of max jobs": 0}
2024-09-23T09:46:22Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of jobs": 0}
2024-09-23T09:46:22Z INFO scaleexecutor Created jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of jobs": 0}
2024-09-23T09:46:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "cicd", "Number of running Jobs": 0}
2024-09-23T09:46:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "cicd", "Number of pending Jobs ": 0}

piotrlaczykowski · 2024-09-24T07:22:19Z

I saw node Edge scaled and run properly, but node Chrome keep 0 for a long time in logs

2024-09-23T09:46:22Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of running Jobs": 24}
2024-09-23T09:46:22Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of pending Jobs ": 1}
2024-09-23T09:46:22Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Effective number of max jobs": 0}
2024-09-23T09:46:22Z INFO scaleexecutor Creating jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of jobs": 0}
2024-09-23T09:46:22Z INFO scaleexecutor Created jobs {"scaledJob.Name": "selenium-edge-node", "scaledJob.Namespace": "cicd", "Number of jobs": 0}
2024-09-23T09:46:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "cicd", "Number of running Jobs": 0}
2024-09-23T09:46:23Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "cicd", "Number of pending Jobs ": 0}

because we don't use chrome nodes. I should delete it actually.

VietND96 · 2024-09-24T07:41:20Z

Ok, so any tests passed in the run with "Number of running Jobs": 24?
If no test is able to run, can you disable the liveness probe in hub/router, to see how long nodes took to register to hub
Also, if you are deploy via Helm command, can you dry run with helm template and attach all resources YAML output?

piotrlaczykowski · 2024-09-24T07:44:58Z

No tests can run.
Can you give me the commands I should type? It would be much easier with artifactHub regarding versioning and naming.
I rolled back to 0.35.2
Also we use kubernetes 1.21 and keda 2.8.2 :)

VietND96 · 2024-09-24T08:13:15Z

Ok, let me see if anyone is facing the same. In CI, tests covered K8s version range 1.25 to 1.31, KEDA latest stable version 2.15.1.
A major change in this version is scaler param url update to fetch from resource TriggerAuthentication - which was available from KEDA core >=2.9 - as docs mentioned https://keda.sh/docs/2.9/scalers/selenium-grid-scaler/

VietND96 · 2024-09-24T10:23:02Z

It would be much easier with artifactHub regarding versioning and naming

I think you can check this https://artifacthub.io/packages/helm/selenium-grid/selenium-grid. We are not the owner, however I saw it is up-to-date.

piotrlaczykowski · 2024-09-27T08:20:45Z

Ok, let me see if anyone is facing the same. In CI, tests covered K8s version range 1.25 to 1.31, KEDA latest stable version 2.15.1. A major change in this version is scaler param url update to fetch from resource TriggerAuthentication - which was available from KEDA core >=2.9 - as docs mentioned https://keda.sh/docs/2.9/scalers/selenium-grid-scaler/

Yep it still doesn't work :/

VietND96 · 2024-09-27T09:44:17Z

What if you keep using chart 0.35.2 and replace new image tag 4.25.0-20240922?

piotrlaczykowski added the needs-triaging label Sep 23, 2024

VietND96 mentioned this issue Sep 23, 2024

chart(fix): basicAuth.embeddedUrl in GraphQL endpoint for old scaler compatible #2408

Merged

8 tasks

VietND96 closed this as completed in #2408 Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛 Bug]: Chart 0.36.0 Distribiutor keeps restarting #2407

[🐛 Bug]: Chart 0.36.0 Distribiutor keeps restarting #2407

piotrlaczykowski commented Sep 23, 2024

github-actions bot commented Sep 23, 2024

VietND96 commented Sep 23, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 27, 2024

VietND96 commented Sep 27, 2024

[🐛 Bug]: Chart 0.36.0 Distribiutor keeps restarting #2407

[🐛 Bug]: Chart 0.36.0 Distribiutor keeps restarting #2407

Comments

piotrlaczykowski commented Sep 23, 2024

What happened?

Command used to start Selenium Grid with Docker (or Kubernetes)

Relevant log output

Operating System

Docker Selenium version (image tag)

Selenium Grid chart version (chart version)

github-actions bot commented Sep 23, 2024

VietND96 commented Sep 23, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 24, 2024

VietND96 commented Sep 24, 2024

VietND96 commented Sep 24, 2024

piotrlaczykowski commented Sep 27, 2024

VietND96 commented Sep 27, 2024