Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🥅 Kill servers on engine death #63

Merged
merged 5 commits into from
Aug 9, 2024
Merged

🥅 Kill servers on engine death #63

merged 5 commits into from
Aug 9, 2024

Conversation

joerunde
Copy link
Contributor

Description

See vllm-project/vllm#6594

This PR modifies the grpc server's exception handler to check if the async llm engine is still running.
If it's not then the handler sends a SIGTERM to kill the serving process, which should trigger the graceful shutdown handlers.

This approach is a bit heavy-handed, but await server.stop(0) seems very unhappy if called within the context of a request handler. Maybe we could use some grpc context termination callbacks for this, but I didn't really think it was worth the time to investigate that too deeply. I think the only real drawback here is that unit testing this feature would be pretty difficult with an os.kill in the mix.

This PR currently does not add this handling to the http server, but once PRs 6594 and vllm-project/vllm#6740 are merged, we should be able to remove our copy-pasta http server so both will handle engine death properly.

How Has This Been Tested?

Tested by installing vllm==0.5.3.post1 and the adapter from this branch onto a dev pod, injecting a runtime failure into the llm engine, and running a grpc request.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2024

Codecov Report

Attention: Patch coverage is 78.26087% with 5 lines in your changes missing coverage. Please review.

Project coverage is 60.52%. Comparing base (96b8e0f) to head (98bb2a4).

Files Patch % Lines
src/vllm_tgis_adapter/grpc/grpc_server.py 58.33% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #63      +/-   ##
==========================================
- Coverage   61.19%   60.52%   -0.67%     
==========================================
  Files          20       20              
  Lines        1193     1183      -10     
  Branches      211      208       -3     
==========================================
- Hits          730      716      -14     
- Misses        387      391       +4     
  Partials       76       76              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dtrifiro
Copy link
Contributor

Waiting on vllm-project/vllm#6594 to make sure that we have a common way of stopping the server

@njhill
Copy link
Contributor

njhill commented Aug 8, 2024

vllm-project/vllm#6594 now merged finally

@joerunde
Copy link
Contributor Author

joerunde commented Aug 8, 2024

vllm-project/vllm#6594 is now merged- I'll see if I can take another pass at this to not use os.kill

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Copy link
Contributor

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joerunde!

@joerunde joerunde added this pull request to the merge queue Aug 9, 2024
Merged via the queue into main with commit a7d89dc Aug 9, 2024
3 checks passed
@joerunde joerunde deleted the kill-grpc-server branch August 9, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants