Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploit vLLM options to return deltas/final-output only #137

Merged
merged 2 commits into from
Oct 7, 2024

Conversation

njhill
Copy link
Contributor

@njhill njhill commented Sep 17, 2024

Nontrivial performance benefit, particularly when running with decoupled front-end process.

These changes require vLLM >= 0.6.1.post2

@njhill
Copy link
Contributor Author

njhill commented Sep 18, 2024

@dtrifiro if necessary I could update these changes to work both before and after v0.6.1.post2 (like you've done for other things)

@dtrifiro
Copy link
Contributor

@njhill yeah I think that'd be better, we can bump the minimum vllm version and drop the backward compatibility code after the next vllm version is out

@njhill
Copy link
Contributor Author

njhill commented Sep 27, 2024

Now that vllm 0.6.2 is released, @dtrifiro agreed that there's no need for the aforementioned backwards compatibility changes, so this PR should now be ready to merge.

@dtrifiro
Copy link
Contributor

dtrifiro commented Sep 27, 2024

@njhill it seems this broke streaming generation

E                   	status = StatusCode.UNKNOWN
E                   	details = "Unexpected <class 'TypeError'>: object of type 'NoneType' has no len()"
E                   	debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B::1%5D:45943 {created_time:"2024-09-27T00:19:45.777203684+00:00", grpc_status:2, grpc_message:"Unexpected <class \'TypeError\'>: object of type \'NoneType\' has no len()"}"
E                   >

See traceback here: https://github.com/opendatahub-io/vllm-tgis-adapter/actions/runs/11062189283/job/30736218398?pr=137#step:8:2614

This can be merged after #143

edit: the CI failure could be due to this branch not being updated, can you rebase on #143 ?

edit: Already rebased, still failing

@njhill
Copy link
Contributor Author

njhill commented Oct 5, 2024

@dtrifiro this one is now fixed and should be working

@codecov-commenter
Copy link

codecov-commenter commented Oct 5, 2024

Codecov Report

Attention: Patch coverage is 62.96296% with 10 lines in your changes missing coverage. Please review.

Project coverage is 58.35%. Comparing base (dd0eb1c) to head (fb849d9).

Files with missing lines Patch % Lines
src/vllm_tgis_adapter/grpc/grpc_server.py 62.50% 3 Missing and 6 partials ⚠️
src/vllm_tgis_adapter/utils.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #137   +/-   ##
=======================================
  Coverage   58.34%   58.35%           
=======================================
  Files          27       27           
  Lines        1611     1616    +5     
  Branches      268      270    +2     
=======================================
+ Hits          940      943    +3     
+ Misses        582      581    -1     
- Partials       89       92    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@njhill
Copy link
Contributor Author

njhill commented Oct 6, 2024

Remaining test failures appeared to be unrelated to this PR.

Probably due to other upstream changes, and we still need to investigate/address of course.

@njhill
Copy link
Contributor Author

njhill commented Oct 7, 2024

Yeah looks like those are due to an upstream bug with CPU backend vllm-project/vllm#9024.

Nontrivial performance benefit, particularly when running with decoupled front-end process.

These changes require vLLM >= 0.6.1.post2
@dtrifiro dtrifiro added this pull request to the merge queue Oct 7, 2024
Merged via the queue into main with commit 33fbd5f Oct 7, 2024
3 checks passed
@dtrifiro dtrifiro deleted the delta-outputs branch October 7, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants