-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add health check, make async Engine more robust #3015
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! In general LGTM. Left some small questions.
vllm/engine/async_llm_engine.py
Outdated
finally: | ||
if exception: | ||
error_callback(exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We raise errors in both try
branch and except
branch. Then what does the finally
here do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to run the error callback even after we re-raise an exception in except
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just do this in the except
block though before re-raising (things will still run in the same order)
vllm/engine/async_llm_engine.py
Outdated
async def wait_for_new_requests(self, clear: bool): | ||
if not self.has_new_requests(): | ||
await self.new_requests_event.wait() | ||
if clear: | ||
self.new_requests_event.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we always clear this flag?
async def wait_for_new_requests(self, clear: bool): | |
if not self.has_new_requests(): | |
await self.new_requests_event.wait() | |
if clear: | |
self.new_requests_event.clear() | |
async def wait_for_new_requests(self): | |
if not self.has_new_requests(): | |
await self.new_requests_event.wait() | |
self.new_requests_event.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what's the reason behind this change? Why do we need to move the clear
call from get_new_and_finished_requests
to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can always clear it.
The reason is to ensure the event is cleared as soon as we have new requests
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Yard1 this looks great
vllm/engine/async_llm_engine.py
Outdated
finally: | ||
if exception: | ||
error_callback(exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could just do this in the except
block though before re-raising (things will still run in the same order)
if not self.has_new_requests(): | ||
await self.new_requests_event.wait() | ||
self.new_requests_event.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to only clear before waiting
if not self.has_new_requests(): | |
await self.new_requests_event.wait() | |
self.new_requests_event.clear() | |
if not self.has_new_requests(): | |
self.new_requests_event.clear() | |
if not self.has_new_requests(): | |
await self.new_requests_event.wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm can you explain why we should do it like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid flip-flopping the event - it only needs to be cleared when you're actually about to wait on it. But I guess with python/asyncio it doesn't matter anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it should be fine
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
For production usecases, we want to be able to detect Engine failures, especially ones that can happen silently (eg. due to NCCL timeouts). This PR adds a health check method (currently only checking the health of Ray workers) and makes the Async engine more robust by adding a timeout for each iteration as well as better error reporting.