Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: org.openqa.selenium.NoSuchSessionException: Unable to find session with ID #14322

Closed
rishabhjain-qait opened this issue Jul 30, 2024 · 16 comments

Comments

@rishabhjain-qait
Copy link

What happened?

Getting org.openqa.selenium.NoSuchSessionException: Unable to find session with ID: issue intermittently,

I have sel grid version 4.21.0-20240517 up and running, with below properties for browser pods in place,
TZ: "Asia/Kolkata"
SE_NODE_MAX_SESSIONS: "1"
SE_NODE_SESSION_TIMEOUT: "10800"
SE_NODE_OVERRIDE_MAX_SESSIONS: "true"
SE_SCREEN_HEIGHT: "1080"
SE_SCREEN_WIDTH: "1920"
SE_OPTS: "--log-level FINEST"

I am running one browser node per k8s pod,
I do have autoscaling for the browser pods in place,

autoscaling works absolutely fine, both upscaling and downscaling,
this issue that i am facing is not very frequent,
but i get this issue sometimes, i am not sure why it is coming,

And i am unable to reproduce this issue on my own, this is intermittent sometimes it comes, sometime it does not,
also not related to test, it is not coming with same test everytime, it can be seen with different test whenever observed

I have integrated Jaeger support with my sel grid, just to look at the traces in order to catch these kind of issues,
but when i am looking at traces for this issue, i don't see any localSessionMap.remove command sent as it's not visible in jaeger,

all i see is at some point it just threw SessionNotAvailable Exception all of a sudden,
it was working fine, it was able to click on the element, and then the next it shows is Unable to Find Session Id,
Adding screenshots of what i see in Jaeger

Screen Shot 2024-07-30 at 12 13 18 PM Screen Shot 2024-07-30 at 12 12 42 PM

Please help in checking once what could be the reason here for this issue,
is there a particular setting that needs to be changed so as to avoid these kind of issues,
please help in checking this once, Thanks in advance.

How can we reproduce the issue?

Adding the logs of what i see in my test output, 

and also adding the stack trace of what i am seeing in jaeger as an exception

Relevant log output

Test Exception
 
Unable to find session with ID: 303f6c17713ba2fe4988d4ecd00194f5 Build info: version: '4.21.0', revision: '79ed462ef4' System info: os.name: 'Linux', os.arch: 'amd64', os.version: '6.1.58+', java.version: '17.0.11' Driver info: driver.version: unknown Build info: version: '4.21.0', revision: '79ed462ef4' System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.14.0-362.24.2.el9_3.x86_64', java.version: '11.0.12' Driver info: org.openqa.selenium.remote.RemoteWebDriver Command: [303f6c17713ba2fe4988d4ecd00194f5, get {url=https://space-prod0-automation.sprinklr.com/logout}] Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 125.0.6422.60, chrome: {chromedriverVersion: 125.0.6422.60 (3ac3319bff9f..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:34867}, goog:loggingPrefs: {browser: ALL}, networkConnectionEnabled: false, pageLoadStrategy: none, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: wss://qa6-selenium-grid-soc..., se:cdpVersion: 125.0.6422.60, se:name: Governance_UI_Macro_Tests/164, se:vnc: wss://qa6-selenium-grid-soc..., se:vncEnabled: true, se:vncLocalAddress: ws://10.102.33.70:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: accept, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true} Session ID: 303f6c17713ba2fe4988d4ecd00194f5

java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)

org.openqa.selenium.remote.ErrorCodec.decode(ErrorCodec.java:167)

org.openqa.selenium.remote.codec.w3c.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:138)

org.openqa.selenium.remote.codec.w3c.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:50)

org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:190)

org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)

org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:518)

org.openqa.selenium.remote.RemoteWebDriver.get(RemoteWebDriver.java:300)





Jaeger Exception

event	
exception
exception.message	
Unable to execute request for an existing session: Unable to find session with ID: 303f6c17713ba2fe4988d4ecd00194f5
Build info: version: '4.21.0', revision: '79ed462ef4'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '6.1.58+', java.version: '17.0.11'
Driver info: driver.version: unknown
exception.stacktrace	
org.openqa.selenium.NoSuchSessionException: Unable to find session with ID: 303f6c17713ba2fe4988d4ecd00194f5
Build info: version: '4.21.0', revision: '79ed462ef4'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '6.1.58+', java.version: '17.0.11'
Driver info: driver.version: unknown
	at org.openqa.selenium.grid.sessionmap.local.LocalSessionMap.get(LocalSessionMap.java:132)
	at org.openqa.selenium.grid.sessionmap.SessionMap.getUri(SessionMap.java:84)
	at org.openqa.selenium.grid.router.HandleSession.lambda$loadSessionId$4(HandleSession.java:223)
	at io.opentelemetry.context.Context.lambda$wrap$2(Context.java:224)
	at org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:180)
	at org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:397)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
	at org.openqa.selenium.grid.router.Router.execute(Router.java:87)
	at org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
	at org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:360)
	at org.openqa.selenium.remote.http.Route.execute(Route.java:69)
	at org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
	at org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)
	at org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)
	at org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Operating System

macOs

Selenium version

4.21.0-20240517

What are the browser(s) and version(s) where you see this issue?

Chrome

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver

Are you using Selenium Grid?

4.21.0-20240517

Copy link

@rishabhjain-qait, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@rishabhjain-qait
Copy link
Author

cc: @rookieInTraining

@diemol
Copy link
Member

diemol commented Jul 30, 2024

@VietND96 do you know?

@VietND96
Copy link
Member

autoscaling works absolutely fine, both upscaling and downscaling,

May I know if it is ScaledObject or ScaledJob?
If it is ScaledObject, pod preStop is executed to graceful shutdown the Node? If yes, settings of terminationGracePeriodSeconds in how long, is it enough for pod keep Terminating to wait for the session to be completed?

@VietND96
Copy link
Member

VietND96 commented Jul 30, 2024

A similar error that also discussed in here SeleniumHQ/docker-selenium#2129 (comment)

@rishabhjain-qait
Copy link
Author

hey @VietND96
thanks for looking at the issue,

I am not using KEDA for the autoscaling part,
i have written a small spring boot application which is doing this work for me,

I am Draining the node in order to scale down if any of the nodes of sel grid is having 0 sessions running
Drain Node https://www.selenium.dev/documentation/grid/advanced_features/endpoints/
Node drain command is for graceful node shutdown. Draining a Node stops the Node after all the ongoing sessions are complete. However, it does not accept any new session requests.

cURL --request POST 'http://localhost:4444/se/grid/distributor/node//drain' --header 'X-REGISTRATION-SECRET;'

@VietND96
Copy link
Member

Also, can you try to upgrade docker image to tag 4.23.0-20240727 (helm chart 0.33.0), which contains the fix #14282 - race condition, a session can be assigned to Node in status DRAINING

@VietND96
Copy link
Member

I am Draining the node in order to scale down if any of the nodes of sel grid is having 0 sessions running

Do you guard the case that at a point of time, having 0 sessions running, drain nodes is triggered but suddenly new requests come? or draining nodes and new requests come together?

@VietND96
Copy link
Member

Also, assume you rely on GraphQL endpoint for getting sessions running. For example, there is a glitch that response return error or something. In this case, how the script makes decision? Is it assume as 0 and trigger the scale down, or retry further before making decision?

@rishabhjain-qait
Copy link
Author

rishabhjain-qait commented Jul 30, 2024

I am Draining the node in order to scale down if any of the nodes of sel grid is having 0 sessions running

Do you guard the case that at a point of time, having 0 sessions running, drain nodes is triggered but suddenly new requests come? or draining nodes and new requests come together?

https://www.selenium.dev/documentation/grid/advanced_features/endpoints/
As mentioned here, once the node is set to drained, no new request would come up to that particular node,
ideally once the session is finished, a new node would spawn up and that would be able to take new requests if present in session queue as per the autoscaling logic written,

ideally the node that is set to drained should not take up any new requests and should be killed as soon as the current session is completed,

Also, assume you rely on GraphQL endpoint for getting sessions running. For example, there is a glitch that response return error or something. In this case, how the script makes decision? Is it assume as 0 and trigger the scale down, or retry further before making decision?

Also if the graphql endpoint returns error which i haven't observed till now,
the script would not assume it as 0 and scale down, instead it will break from the logic, and then it would just try to hit the same graphql endpoint in another 10 sec to get the status and then makes the decision accordingly if needs to scale up/down

@VietND96
Copy link
Member

As mentioned here, once the node is set to drained, no new request would come up to that particular node,

I think the scaler not able to guard this, since Hub makes decision to assign session. So try the the new fix I mentioned to see able to avoid DRAINING node picking up new session.

ideally once the session is finished, a new node would spawn up and that would be able to take new requests if present in session queue as per the autoscaling logic written,

Again, question to the scaler. Once the session is finished, how scaler do the scale down? Does scaler consider exactly which pod will be scaled down, or it just randomly selected?

@rishabhjain-qait
Copy link
Author

hey @VietND96

Yes scaler is considering exactly which pod to be scaled down, it does not select randomly,

the pod which needs to be scaled down, i am only updating that pod's deletion cost with below,
String payload = "{ "metadata": { "annotations": { "controller.kubernetes.io/pod-deletion-cost": "-1" } } }";

and then scaling down so as to ensure correct pod scaled down and not any other

@joerg1985
Copy link
Member

@rishabhjain-qait Is this happening shortly after the session is started?
A small delay in processing the NodeRestartedEvent might cause this trouble.

@edsherwin
Copy link

@rishabhjain-qait have you resolve your issue with KEDA? if yes, can you please share also. Thanks

@diemol
Copy link
Member

diemol commented Nov 5, 2024

I will close this as the issue has not had any more activity.

@diemol diemol closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants