-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] Compatibility matrix for mutual exclusive features #8512
Conversation
e72169b
to
18afafe
Compare
This should also cut down on the number of issues flagged as bugs when in fact the feature is not supported yet. Thanks for adding this! Some comments:
|
Thanks for the feedback and the contribution on the multimodal feature @DarkLight1337 @ywang96 !
Yeah, kind of. The first time I added, the row become empty. Therefore I thought it would be waste to add it. I added again to you see it. If you think it looks nice I have no problem on keep it.
I discussed that before with a colleague, and we agreed to present like this in a concise format. What do you think? Do you think it looks like that bad? I can discuss this again and review it. (by the way, you missed TPU backend). Yeah, I know. There are some other devices that I did not add. Of course, there are some hanging fruits. But TPU and other devices I am currently not working with them and I don't have an easy setup to test like I did to the others. Would be nice if someone could contribute to add this information. BTW, do you know or know someone that knows the support of multimodal on AMD and CPU? I assume that is supported on Nvidia by default. But do you know the minimum compute capability for this feature? Here is the updated table: TABLE
|
Looks better now, thanks.
I'm not overly bothered by this so you can keep it as is if that's the plan.
It's supported on both. This feature isn't tied to compute capability, it only depends on whether the model runner implements it. |
Thanks a lot for this @wallashss! We should get the first pass of this added to the docs imo while we're working on adding more rows/columns, and make more folks aware of it.
Yeah it was my suggestion to keep the top right triangle empty so that we don't have a bunch of duplicate entries (which could also then become out of sync if we updated one and not its reflection). But it does make it harder to look at one row/column at a time and see all the features it works with.. you have to follow an L shape. Not really sure which is better.
@wallashss assuming I'm the colleague, I'm fine with having it split to a separate table :), but it would still be good for the columns to be the same and lined up (and not sure whether that would be possible with markdown). Some other features we might want to add:
We could also shorten some of the other names to make things fit better (and can include a legend at the bottom if needed).. like "Spec decoding" or even "SD". |
would it be possible to add a column for prefix caching? I have found it does not work on volta-arch CUDA devices (Nvidia v100 tesla) due to some triton operation being unsupported on the older device. Possibly also rows for different cuda capabilities - some features work on hopper and ada but not ampere, or on ampere and newer but not volta. |
@K-Mistele APC is (automatic) prefix caching :) Agree expanding with specific compute capabilities would be good! |
Thanks for the feedback @K-Mistele.
How did you figure out that? Does vllm raise an exception/log warning with a clear message, or did it just run "fine" and you knew that was not right? The original idea was to split on Nvidia architecture, but in my research, at least for those features, I thought it may not be necessary. |
It raised an exception about an unsupported triton operation or something like that. I can reproduce it if you'd like to see the specific error although I don't think I have a screenshot of it right this second. I think this happens for chunked prefill too. |
Wow, if you could paste the errors here that would be really nice! Thanks. |
UPDATE: Tested on Turing Architecture (7.5) and both APC and Chunked prefill worked fine. @K-Mistele did have any chance to get the error on Volta? |
Can do! I ran into the issue on a v100 Tesla 32GB, which is a volta device - not turing. |
It may be good to remind PR authors about updating the compatibility matrix if relevant (e.g. using PR checklist). Just to make sure it remains up to date. |
btw @wallashss I found open PRs with more details information on chunked prefill and APC issues on volta devices: For Chunked prefill on Nvidia Tesla v100 (Volta):
|
THanks for that @K-Mistele This will be very useful, gonna update the table and propose a new one very soon. |
no problem! Haven't gotten around to reproducing the APC issue but IIRC it was similar. |
FYI as additional info to the compatibility matrix, as of commit cb3b2b9 . AMD Multi-Step Feature is working, however a combination of MultiStep + Chunked Prefill is only supported on CUDA. Related issue: #9111 (comment) |
@wallashss this can be added for LoRA + chunked prefill: #9057 Re @pooyadavoodi's comment above, perhaps you could include an addition to the pull request template in this same PR: https://github.com/vllm-project/vllm/blob/main/.github/PULL_REQUEST_TEMPLATE.md |
19e1482
to
10f4fab
Compare
Signed-off-by: Wallas Santos <wallashss@ibm.com>
10f4fab
to
de254c7
Compare
Hey everyone, Thank you for your contributions and feedback. I did a major update on the matrix:
|
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Thanks @wallashss for all of the hard work on this! Let's get it merged and we can make other adjustments as follow-ons. |
…t#8512) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Alvant <alvasian@yandex.ru>
…t#8512) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Amit Garg <mitgarg17495@gmail.com>
…t#8512) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
…t#8512) Signed-off-by: Wallas Santos <wallashss@ibm.com>
…t#8512) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Greetings,
We did a study of mutual exclusive features on vLLM and consolidated in a compatibility matrix.
We propose to add the compatibility matrix to the documentation pages to help users to quick consult to plan their implementation or study.
Following the table in markdown for quick checking and help reviewers.
CC @njhill @maxdebayser
[C] = Link to where a check is made in the code and error reported
[T] = Link to open tracking issue or PR to address the incompatibility