-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Detokenize delay when update vllm from 0.3.0 to 0.4.2 #5206
Labels
bug
Something isn't working
Comments
With the changes in pr #5207, the time of generate almost reduce to the same as version 0.3.0
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Your current environment
🐛 Describe the bug
I updated the vllm from 0.3.0 to 0.4.2, both are offical release, while the generation time increased from 45 seconds to 50 minutes on qwen1.5-0.5B model with the AdvertiseGen dataset.
After compared each phase of generation between the two release version, I found out that the main delay was caused by the function
detokenize_incrementally in vllm.transformers_utils.detokenizer.py
.I added log to record time cost of the upper func, with the result that time cost increased almost 1000 times
Focus on the fun 'detokenize_incrementally', I find out that
len(tokenizer)
is introduced to get the vocab size of the tokenizer, while it's not a nice choice because this func costs much more time thantokenizer.vocab_size
. I think we should replace all thelen(tokenizer)
with ````tokenizer.vocab_size``` to improve the generation performanceThe text was updated successfully, but these errors were encountered: