Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vLLM throws error when sampling from Cerebras GPT Models #11224

Closed
1 task done
RylanSchaeffer opened this issue Dec 16, 2024 · 19 comments
Closed
1 task done

[Bug]: vLLM throws error when sampling from Cerebras GPT Models #11224

RylanSchaeffer opened this issue Dec 16, 2024 · 19 comments
Labels
bug Something isn't working

Comments

@RylanSchaeffer
Copy link

Your current environment

The output of `python collect_env.py`
python -u collect_env.py 
Collecting environment information...
Traceback (most recent call last):
  File "/lfs/skampere1/0/rschaef/KoyejoLab-Pretraining-Inference-Compute-Exchange-Rate/collect_env.py", line 765, in <module>
    main()
  File "/lfs/skampere1/0/rschaef/KoyejoLab-Pretraining-Inference-Compute-Exchange-Rate/collect_env.py", line 744, in main
    output = get_pretty_env_info()
             ^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/KoyejoLab-Pretraining-Inference-Compute-Exchange-Rate/collect_env.py", line 739, in get_pretty_env_info
    return pretty_str(get_env_info())
                      ^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/KoyejoLab-Pretraining-Inference-Compute-Exchange-Rate/collect_env.py", line 568, in get_env_info
    vllm_version = get_vllm_version()
                   ^^^^^^^^^^^^^^^^^^
  File "/lfs/skampere1/0/rschaef/KoyejoLab-Pretraining-Inference-Compute-Exchange-Rate/collect_env.py", line 273, in get_vllm_version
    from vllm import __version__, __version_tuple__
ImportError: cannot import name '__version_tuple__' from 'vllm' (/lfs/skampere1/0/rschaef/miniconda3/envs/llmonk/lib/python3.11/site-packages/vllm/__init__.py)

Model Input Dumps

No response

🐛 Describe the bug

vLLM throws an error when attempting to use Cerebras's models. Here is a minimal reproduction:

from vllm import LLM, SamplingParams
from vllm.distributed.parallel_state import destroy_model_parallel


model = LLM(model="cerebras/Cerebras-GPT-1.3B", dtype="bfloat16")

model_sampling_params = SamplingParams(
    n=1,
    temperature=1.0,
    max_tokens=64,
    seed=0,
)

output = model.generate(
    prompts=["Please continue the following sentence: The quick brown fox jumps "],
    sampling_params=model_sampling_params,
)

The error is: TypeError: 'NoneType' object is not iterable

It arises here:

    def _verify_embedding_mode(self) -> None:
        architectures = getattr(self.hf_config, "architectures", [])
        self.embedding_mode = any(
            ModelRegistry.is_embedding_model(arch) for arch in architectures)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@RylanSchaeffer RylanSchaeffer added the bug Something isn't working label Dec 16, 2024
@RylanSchaeffer RylanSchaeffer changed the title [Bug]: Unable to Use Cerebras GPT Models [Bug]: vLLM throws error when sampling from Cerebras GPT Models Dec 16, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Dec 16, 2024

cerebras/Cerebras-GPT-1.3B doesn't have a valid config.json file, I think. It should have the architectures field like in cerebras/Cerebras-GPT-13B.

@DarkLight1337
Copy link
Member

Although HF does keep mappings from model_type to architecture in https://github.com/huggingface/transformers/blob/5615a393691c81e00251e420c73e4d04c6fe22e5/src/transformers/models/auto/modeling_auto.py#L1564, it's not always clear which mapping should be used.

@DarkLight1337
Copy link
Member

To solve this, I suggest you pass the architecture name explicitly via --hf-overrides.

@RylanSchaeffer
Copy link
Author

@DarkLight1337 could you please explain how to modify my above minimal example?

When I try something like:

model = LLM(
    model="cerebras/Cerebras-GPT-1.3B",
    hf_overrides={"model_name_or_path": "cerebras/Cerebras-GPT-1.3B"},
    dtype="bfloat16",
)

I hit the error: `TypeError: EngineArgs.init() got an unexpected keyword argument 'hf_overrides'

@DarkLight1337
Copy link
Member

What is your vLLM version? You might have to update it for this to be supported.

@RylanSchaeffer
Copy link
Author

0.5.4

@DarkLight1337
Copy link
Member

Yeah, pretty sure you need to update vLLM.

@RylanSchaeffer
Copy link
Author

Is there any guarantee of backwards consistency? I've been generating data for a couple months and I need to make sure there's no distribution shift if I change the vllm version

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jan 15, 2025

hf_overrides is a new option that was only added recently. It doesn't change old behavior.

@RylanSchaeffer
Copy link
Author

I've updated to 0.6.6.post1. Can you please now tell me how to correctly call Cerebras usinng the LLM() class in Python?

@RylanSchaeffer
Copy link
Author

I'm currently trying:

model = LLM(
    model="cerebras/Cerebras-GPT-1.3B",
    hf_overrides={"architecture": "cerebras/Cerebras-GPT-1.3B"},
    dtype="bfloat16",
)

But this throws:

vllm/model_executor/models/registry.py", line 416, in inspect_model_cls
    for arch in architectures:
TypeError: 'NoneType' object is not iterable

@RylanSchaeffer
Copy link
Author

To ask a related but separate follow up question, when I try:

model = LLM(
    model="cerebras/Cerebras-GPT-13B",
    hf_overrides={"architecture": "model_type"},
    dtype="bfloat16",
)

I receive the following error: ValueError: Model architectures ['GPT2Model'] are not supported for now.

Since I believe all of the Cerebras models are based on GPT2, what would you advise?

@DarkLight1337
Copy link
Member

The "architecture" field should be class name of the model that's implemented in vLLM. In this case, it should be GPT2LMHeadModel as shown in the list of supported models.

@RylanSchaeffer
Copy link
Author

Can you please provide a correctly functioning minimal working example?

@RylanSchaeffer
Copy link
Author

model = LLM(
    model="cerebras/Cerebras-GPT-1.3B",
    hf_overrides={"architecture": "GPT2LMHeadModel"},
    dtype="bfloat16",
)

throws the error:

vllm/model_executor/models/registry.py", line 416, in inspect_model_cls
    for arch in architectures:
TypeError: 'NoneType' object is not iterable

@DarkLight1337
Copy link
Member

DarkLight1337 commented Jan 16, 2025

Also, the key should be "architectures" (plural) and you need to pass a list to it. It is basically the same format as HF config.json.

@RylanSchaeffer
Copy link
Author

Can you please give a functioning minimal working example?

@RylanSchaeffer
Copy link
Author

model = LLM(
    model="cerebras/Cerebras-GPT-1.3B",
    hf_overrides={"architectures": ["GPT2LMHeadModel"]},
    dtype="bfloat16",
)

@RylanSchaeffer
Copy link
Author

I'm currently testing this. If it works, I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants