Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tokenizer] Add an option to specify tokenizer #284

Merged
merged 13 commits into from
Jun 28, 2023
Merged

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Jun 28, 2023

Fixes #111 #246 #259 #270 #281

This PR adds tokenizer to the input/cli arguments. If it is None, vLLM uses the model name/path as the tokenizer name/path. In addition, from this PR, vLLM does not use hf-internal-testing/llama-tokenizer as the default tokenizer for llama models.

@sleepcoo
Copy link

This pr is very useful, my local test is always hard code tokenizer path

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the great work!

Comment on lines +20 to +25
if "llama" in tokenizer_name.lower() and kwargs.get("use_fast", True):
logger.info(
"OpenLLaMA models do not support the fast tokenizer. "
"Using the slow tokenizer instead.")
elif config.model_type == "llama" and kwargs.get("use_fast", True):
# LLaMA fast tokenizer causes protobuf errors in some environments.
# However, we found that the below LLaMA fast tokenizer works well in
# most environments.
model_name = "hf-internal-testing/llama-tokenizer"
logger.info(
f"Using the LLaMA fast tokenizer in '{model_name}' to avoid "
"potential protobuf errors.")
elif config.model_type in _MODEL_TYPES_WITH_SLOW_TOKENIZER:
if kwargs.get("use_fast", False) == True:
raise ValueError(
f"Cannot use the fast tokenizer for {config.model_type} due to "
"bugs in the fast tokenizer.")
logger.info(
f"Using the slow tokenizer for {config.model_type} due to bugs in "
"the fast tokenizer. This could potentially lead to performance "
"degradation.")
kwargs["use_fast"] = False
return AutoTokenizer.from_pretrained(model_name, *args, **kwargs)
"For some LLaMA-based models, initializing the fast tokenizer may "
"take a long time. To eliminate the initialization time, consider "
f"using '{_FAST_LLAMA_TOKENIZER}' instead of the original "
"tokenizer.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, do we need to manually specify llama to use the fast tokenizer for benchmarking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. Actually, LLaMA fast tokenizers in lmsys/vicuna-7b-v1.3 or huggyllama/llama-7b work in my docker environment. So hf-internal-testing/llama-tokenizer is not needed when I use vLLM in my docker environment.

@WoosukKwon WoosukKwon merged commit 4338cc4 into main Jun 28, 2023
@WoosukKwon WoosukKwon deleted the custom-tokenizer branch June 28, 2023 16:47
@929359291
Copy link

wow is cool,bro you are so cool

@sunyuhan19981208
Copy link

THANKS VERY MUCH!

michaelfeil pushed a commit to michaelfeil/vllm that referenced this pull request Jul 1, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
SUMMARY:
* only run 4 x a10 tests for python 3.10.12

NOTE: AWS looks to be having availability issues with these instances.
i'm day to day with this repo being migrated to GCP, so in the meantime
let's reduce demand.

TEST PLAN:
n/a

Co-authored-by: andy-neuma <andy@neuralmagic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants