-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[grid] Distributor listen and handle the NodeRestartedEvent #14938
Conversation
Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
CI Failure Feedback 🧐(Checks updated until commit a6800a9)
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
…HQ#14938) Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
User description
Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it
Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.
Description
By testing the parallel, where 100 sessions with batch submitted and Node containers spawn and shut down continuously. Due to this, it could be the case where Node restarted (URI is the same, but NodeId is different). Intermittently, 2-3 sessions failed, which is not the request timed out case, just see the message that the session was removed from LocalSessionMap.
LocalDistributor is getting a
snapshot
from GridModel. So, it is potential for a race condition where the Distributor fetches available Nodes, including a Node that could get restarted in the meantime (since updateNodeAvailability by an interval) and assign a session to it, then very fast the session is removed by LocalSessionMap (due to it also consumed theNodeRestartedEvent
)Let LocalDistributor also listen to the event
NodeRestartedEvent
and immediately lock the thread to updateNodeAvailability, thengetAvailableNodes
doesn't have a chance to see that Node anymore.The same action will be synced across LocalSessionMap, Distributor, GridModel.
Motivation and Context
Types of changes
Checklist
PR Type
Bug fix
Description
NodeRestartedEvent
inLocalDistributor
to prevent race conditions when nodes restart during session allocationChanges walkthrough 📝
LocalDistributor.java
Add Node restart event handling in LocalDistributor
java/src/org/openqa/selenium/grid/distributor/local/LocalDistributor.java
NodeRestartedEvent
by registering a new eventlistener
handleNodeRestarted
method to set node availability toDOWN when a node restarts
NodeRestartedEvent
class