garbage output from h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b #281

tjedwards · 2023-06-27T16:32:24Z

Using the simple python script on the "supported-models" page I was able to successfully generate output from TheBloke/Wizard-Vicuna-13B-Uncensored-HF, but h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b generates garbage.

I'm running CUDA 11.7.1 on RHEL 8.4 and an NVIDIA A100-SXM-80GB.

Here's the script:

import sys
from vllm import LLM
llm = LLM(model=sys.argv[1])
output = llm.generate("Hello, my name is")
print(output)

Here's output from TheBloke:

prompt='Hello, my name is'
text="Bastian Mehl and I'm going to talk about how we can solve"

And here's output from h2oai:

prompt='Hello, my name is'
text='\u0442\u0435 Business up t");ymbol\u7532 _ itsardervesag t beskrevs t \u201c

Here's the full output:

Loading h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b
INFO 06-27 10:46:22 llm_engine.py:59] Initializing an LLM engine with config: model='/tmp/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b', dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
INFO 06-27 10:46:22 tokenizer_utils.py:30] Using the LLaMA fast tokenizer in 'hf-internal-testing/llama-tokenizer' to avoid potential protobuf errors.
INFO 06-27 10:51:57 llm_engine.py:128] # GPU blocks: 3808, # CPU blocks: 327
Processed prompts: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 1/1 [00:00<00:00,  3.17it/s]
[RequestOutput(request_id=0, prompt='Hello, my name is', prompt_token_ids=[1, 15043, 29892, 590, 1024, 338], outputs=[CompletionOutput(index=0, text='\u0442\u0435 Business up t");ymbol\u7532 _ itsardervesag t beskrevs t \u201c', token_ids=[730, 15197, 701, 260, 1496, 2789, 31843, 903, 967, 538, 20098, 351, 260, 7718, 260, 1346], cumulative_logprob=-90.18120217323303, logprobs={}, finish_reason=length)], finished=True)]

The text was updated successfully, but these errors were encountered:

zhuohan123 · 2023-06-27T16:42:06Z

This seems like the tokenizer's issue. We are adding support for custom tokenizers (#111). Meanwhile, you can try to directly modify the function below to use the correct tokenizer.

vllm/vllm/engine/tokenizer_utils.py

Line 13 in 4026a04

def get_tokenizer(

tjedwards · 2023-06-27T16:46:57Z

Thanks, I'll give that a try! BTW I just tried openlm-research/open_llama_13b and it produced viable results.

tjedwards · 2023-06-27T17:02:22Z

It was an easy fix:

    if "open_llama" in model_name or "open-llama" in model_name:
        kwargs["use_fast"] = False

I added the extra "or" testing for the dash variant, and all is well!

tjedwards · 2023-06-28T13:09:25Z

In hindsight, this is better:

if "open" in model_name and "llama" in model_name:
    kwargs["use_fast"] = False

Ah, never mind! I just looked at the changes to allow a custom tokenizer, and this whole test goes away. :)

WoosukKwon · 2023-06-28T21:20:50Z

@tjedwards We've added a new argument tokenizer_mode, which can be either auto or slow.

Try out the following:

llm = LLM(model="openlm-research/open_llama_13b", tokenizer_mode="slow")

status should have an initial value Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>

Increase garbage collector's threshold in order to reduce it's frequency

zhuohan123 mentioned this issue Jun 27, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

zhuohan123 mentioned this issue Jun 28, 2023

[Tokenizer] Add an option to specify tokenizer #284

Merged

WoosukKwon mentioned this issue Jun 28, 2023

[Tokenizer] Add tokenizer mode #298

Merged

WoosukKwon closed this as completed in #298 Jun 28, 2023

yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024

fix a minor bug for docker build (vllm-project#281)

f784deb

status should have an initial value Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local>

jikunshang pushed a commit to jikunshang/vllm that referenced this issue Sep 24, 2024

Increase garbage collector's threshold (vllm-project#281)

88b06c2

Increase garbage collector's threshold in order to reduce it's frequency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

garbage output from h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b #281

garbage output from h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b #281

tjedwards commented Jun 27, 2023

zhuohan123 commented Jun 27, 2023

tjedwards commented Jun 27, 2023

tjedwards commented Jun 27, 2023

tjedwards commented Jun 28, 2023 •

edited

Loading

WoosukKwon commented Jun 28, 2023

garbage output from h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b #281

garbage output from h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b #281

Comments

tjedwards commented Jun 27, 2023

zhuohan123 commented Jun 27, 2023

tjedwards commented Jun 27, 2023

tjedwards commented Jun 27, 2023

tjedwards commented Jun 28, 2023 • edited Loading

WoosukKwon commented Jun 28, 2023

tjedwards commented Jun 28, 2023 •

edited

Loading