Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm推理报错:无法在rope_scaling中获取factor字段 #96

Closed
Potato-wll opened this issue Sep 3, 2024 · 32 comments
Closed

vllm推理报错:无法在rope_scaling中获取factor字段 #96

Potato-wll opened this issue Sep 3, 2024 · 32 comments
Assignees

Comments

@Potato-wll
Copy link

这是我的运行代码:
python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-7B-Instruct --model /home/wangll/llm/model_download_demo/models/Qwen/Qwen2-VL-7B-Instruct

以下是报错信息:
INFO 09-03 18:48:04 api_server.py:440] vLLM API server version 0.5.5
INFO 09-03 18:48:04 api_server.py:441] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, model='/home/wangll/llm/model_download_demo/models/Qwen/Qwen2-VL-7B-Instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='float16', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['Qwen2-VL-7B-Instruct'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
Traceback (most recent call last):
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 476, in
asyncio.run(run_server(args))
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 443, in run_server
async with build_async_engine_client(args) as async_engine_client:
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 117, in build_async_engine_client
if (model_is_embedding(args.model, args.trust_remote_code,
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 71, in model_is_embedding
return ModelConfig(model=model_name,
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/site-packages/vllm/config.py", line 214, in init
self.max_model_len = _get_and_verify_max_len(
File "/home/wangll/.conda/envs/Qwen2vl/lib/python3.10/site-packages/vllm/config.py", line 1650, in _get_and_verify_max_len
assert "factor" in rope_scaling
AssertionError

我去看了模型的配置文件config.json,里面的rope_scaling确实没有factor字段,
"rope_scaling": {
"type": "mrope",
"mrope_section": [
16,
24,
24
]
},
"vocab_size": 152064
}

@fyabc
Copy link
Collaborator

fyabc commented Sep 4, 2024

@Potato-wll 您好,这是由于您使用的vllm版本不匹配导致的,具体可参考 #35

@Potato-wll
Copy link
Author

我用vllm启动后报错,FlashAttention only supports Ampere GPUs or newer.我的显卡是T4,用不了flashatt,怎么在哪关

@fyabc
Copy link
Collaborator

fyabc commented Sep 9, 2024

我用vllm启动后报错,FlashAttention only supports Ampere GPUs or newer.我的显卡是T4,用不了flashatt,怎么在哪关

@Potato-wll 您好,我们更新了vllm代码 以及相应的镜像,在不支持flash-attn的情况下使用xformers进行推理,请更新到最新的代码/镜像然后重试。

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

我用vllm启动后报错,FlashAttention only supports Ampere GPUs or newer.我的显卡是T4,用不了flashatt,怎么在哪关

@Potato-wll 您好,我们更新了vllm代码 以及相应的镜像,在不支持flash-attn的情况下使用xformers进行推理,请更新到最新的代码/镜像然后重试。

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

@fyabc
Copy link
Collaborator

fyabc commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

好的谢谢,切了分支,目前正在重新安装,请问下这个vllm版本的话,qwen2vl支持单请求多图调用吗?

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

你好,切了这个分支还是报这个错,只是行数不一样了,请帮忙看看:

File "/mnt/xie/libs/vllm_qwen2vl/vllm/entrypoints/openai/api_server.py", line 73, in model_is_embedding
    return ModelConfig(model=model_name,
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 227, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1747, in _get_and_verify_max_len
    assert "factor" in rope_scaling
AssertionError

@fyabc
Copy link
Collaborator

fyabc commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

好的谢谢,切了分支,目前正在重新安装,请问下这个vllm版本的话,qwen2vl支持单请求多图调用吗?

Qwen2-VL支持单条请求多个图片,具体调用方式请参考这里

@fyabc
Copy link
Collaborator

fyabc commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

你好,切了这个分支还是报这个错,只是行数不一样了,请帮忙看看:

File "/mnt/xie/libs/vllm_qwen2vl/vllm/entrypoints/openai/api_server.py", line 73, in model_is_embedding
    return ModelConfig(model=model_name,
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 227, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1747, in _get_and_verify_max_len
    assert "factor" in rope_scaling
AssertionError

可以提供一下您下载的模型文件中的config.json内容吗?看起来是这里读取的时候出错了

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

你好,切了这个分支还是报这个错,只是行数不一样了,请帮忙看看:

File "/mnt/xie/libs/vllm_qwen2vl/vllm/entrypoints/openai/api_server.py", line 73, in model_is_embedding
    return ModelConfig(model=model_name,
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 227, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1747, in _get_and_verify_max_len
    assert "factor" in rope_scaling
AssertionError

可以提供一下您下载的模型文件中的config.json内容吗?看起来是这里读取的时候出错了
config 如下:

{
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "vision_start_token_id": 151652,
  "vision_end_token_id": 151653,
  "vision_token_id": 151654,
  "image_token_id": 151655,
  "video_token_id": 151656,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vision_config": {
    "depth": 32,
    "embed_dim": 1280,
    "mlp_ratio": 4,
    "num_heads": 16,
    "in_chans": 3,
    "hidden_size": 3584,
    "patch_size": 14,
    "spatial_merge_size": 2,
    "spatial_patch_size": 14,
    "temporal_patch_size": 2
  },
  "rope_scaling": {
    "type": "mrope",
    "mrope_section": [
      16,
      24,
      24
    ]
  },
  "vocab_size": 152064
}

@xyfZzz
Copy link

xyfZzz commented Sep 9, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

我的安装方法如下,不知道有没有问题?:

git clone https://github.com/fyabc/vllm.git
cd vllm
git checkout origin/add_qwen2_vl_new
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu121

@docShen
Copy link

docShen commented Sep 10, 2024

用你们最新的vllm代码安装后还是有个这个错:

  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 224, in __init__
    self.max_model_len = _get_and_verify_max_len(
  File "/mnt/xie/libs/vllm_qwen2vl/vllm/config.py", line 1740, in _get_and_verify_max_len
    assert "factor" in rope_scaling

你好,根据错误信息,这应该不是和https://github.com/fyabc/vllm/tree/add_qwen2_vl_new一致的最新版本,麻烦检查下git commit id是否正确。

我的安装方法如下,不知道有没有问题?:

git clone https://github.com/fyabc/vllm.git
cd vllm
git checkout origin/add_qwen2_vl_new
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu121

我也是 按照官方fork的vllm版本,但是还是会报这个错误

@fyabc
Copy link
Collaborator

fyabc commented Sep 10, 2024

@docShen @xyfZzz 可以提供下目前使用的transformers版本吗(pip list | grep transformers)?

@xyfZzz
Copy link

xyfZzz commented Sep 10, 2024

@docShen @xyfZzz 可以提供下目前使用的transformers版本吗(pip list | grep transformers)?

4.45.0.dev0

@xyfZzz
Copy link

xyfZzz commented Sep 10, 2024

@docShen @xyfZzz 可以提供下目前使用的transformers版本吗(pip list | grep transformers)?

请问是这里描述的这个问题导致的吗?:vllm-project/vllm#7905 (comment)

@SiyangJ
Copy link

SiyangJ commented Sep 10, 2024

问题+1,exact same problem
完全follow您的步骤 @fyabc
在启动vllm openai server时出现了问题
root@3f75a56c8be9:/vllm-workspace# python3 -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-7B-Instruct --model /weights/Qwen2-VL-7B-Instruct
INFO 09-10 06:32:14 api_server.py:495] vLLM API server version 0.6.0
INFO 09-10 06:32:14 api_server.py:496] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, model='/weights/Qwen2-VL-7B-Instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['Qwen2-VL-7B-Instruct'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, override_neuron_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 531, in
asyncio.run(run_server(args))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 498, in run_server
async with build_async_engine_client(args) as async_engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 110, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 132, in build_async_engine_client_from_engine_args
if (model_is_embedding(engine_args.model, engine_args.trust_remote_code,
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 73, in model_is_embedding
return ModelConfig(model=model_name,
File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 224, in init
self.max_model_len = _get_and_verify_max_len(
File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 1740, in _get_and_verify_max_len
assert "factor" in rope_scaling
AssertionError

@fyabc
Copy link
Collaborator

fyabc commented Sep 10, 2024

@xyfZzz @docShen @Potato-wll 您好,这应当是transformers最新版本的一个bug,我已经提交了相关issue,目前请先使用如下方式安装没有bug的版本:

pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

@lilin-git
Copy link

在config文件rope_scaling中加一个factor字段;另在https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175 前面加一行if rope_scaling and rope_scaling.get('type','default') == 'default': rope_scaling['type'] = 'mrope'可暂时解决问题

@fyabc
Copy link
Collaborator

fyabc commented Sep 10, 2024

在config文件rope_scaling中加一个factor字段;另在https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175 前面加一行if rope_scaling and rope_scaling.get('type','default') == 'default': rope_scaling['type'] = 'mrope'可暂时解决问题

@lilin-git 感谢说明,这里提到的前一个方法是可行的;后一个方法不建议(由于rope_scaling['type']在模型初始化之外的地方也被使用,只修改此处会导致bug)

@xyfZzz @Potato-wll @docShen 如果重新安装transformers较为麻烦,也可使用上面提到的方法。

@fyabc fyabc self-assigned this Sep 10, 2024
@xyfZzz
Copy link

xyfZzz commented Sep 10, 2024

在config文件rope_scaling中加一个factor字段;另在https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175 前面加一行if rope_scaling and rope_scaling.get('type','default') == 'default': rope_scaling['type'] = 'mrope'可暂时解决问题

@lilin-git 感谢说明,这里提到的前一个方法是可行的;后一个方法不建议(由于rope_scaling['type']在模型初始化之外的地方也被使用,只修改此处会导致bug)

@xyfZzz @Potato-wll @docShen 如果重新安装transformers较为麻烦,也可使用上面提到的方法。

正常运行了,感谢!

@wuzhizhige
Copy link

加了factor,改了qwen2.py后运行python -m vllm.entrypoints.openai.api_server --model /data/Qwen2-VL-7B-Instruct --served-model-name Qwen2-VL-7B --port 10000
报错(https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175)
File "/data/vllm/vllm/model_executor/models/init.py", line 170, in resolve_model_cls
raise ValueError(
ValueError: Model architectures ['Qwen2VLForConditionalGeneration'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'UltravoxModel', 'QWenLMHeadModel', 'BartModel', 'BartForConditionalGeneration']
ERROR 09-10 19:42:19 api_server.py:188] RPCServer process died before responding to readiness probe

@fyabc
Copy link
Collaborator

fyabc commented Sep 10, 2024

加了factor,改了qwen2.py后运行python -m vllm.entrypoints.openai.api_server --model /data/Qwen2-VL-7B-Instruct --served-model-name Qwen2-VL-7B --port 10000 报错(https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175) File "/data/vllm/vllm/model_executor/models/init.py", line 170, in resolve_model_cls raise ValueError( ValueError: Model architectures ['Qwen2VLForConditionalGeneration'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'UltravoxModel', 'QWenLMHeadModel', 'BartModel', 'BartForConditionalGeneration'] ERROR 09-10 19:42:19 api_server.py:188] RPCServer process died before responding to readiness probe

您好,请检查一下您使用的vllm版本,似乎不是正确的版本

@wuzhizhige
Copy link

加了factor,改了 qwen2.py 后运行python -m vllm.entrypoints.openai.api_server --model /data/Qwen2-VL-7B-Instruct --served-model-name Qwen2-VL-7B --port 10000 报错(https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175) 文件 “/data/vllm/vllm/model_executor/models/init.py”,第 170 行,resolve_model_cls引发 ValueError( ValueError: Model architectures ['Qwen2VLForConditionalGeneration'] 目前不受支持。支持的架构: ['AquilaModel', 'AquilaForCausalLM', 'BaichuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'UltravoxModel', 'QWenLMHeadModel', 'BartModel', 'BartForConditionalGeneration'] 错误 09-10 19:42:19 api_server.py:188] RPCServer 进程在响应就绪情况探测之前死亡

您好,请检查一下您使用的vllm版本,似乎不是正确的版本

0.6.0和0.5.5都报这个错误

@lilin-git
Copy link

加了factor,改了 qwen2.py 后运行python -m vllm.entrypoints.openai.api_server --model /data/Qwen2-VL-7B-Instruct --served-model-name Qwen2-VL-7B --port 10000 报错(https://github.com/fyabc/vllm/blob/6f3116c9dad0537b2858af703938aa9bf6c25bcf/vllm/model_executor/models/qwen2.py#L175) 文件 “/data/vllm/vllm/model_executor/models/init.py”,第 170 行,resolve_model_cls引发 ValueError( ValueError: Model architectures ['Qwen2VLForConditionalGeneration'] 目前不受支持。支持的架构: ['AquilaModel', 'AquilaForCausalLM', 'BaichuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'UltravoxModel', 'QWenLMHeadModel', 'BartModel', 'BartForConditionalGeneration'] 错误 09-10 19:42:19 api_server.py:188] RPCServer 进程在响应就绪情况探测之前死亡

您好,请检查一下您使用的vllm版本,似乎不是正确的版本

0.6.0和0.5.5都报这个错误

额,看链接。不是官方版本,官方版本还不支持。这个项目:https://github.com/fyabc/vllm/tree/add_qwen2_vl_new

@fyabc
Copy link
Collaborator

fyabc commented Sep 11, 2024

0.6.0和0.5.5都报这个错误

@wuzhizhige 目前Qwen2-VL vllm支持尚未合并到官方,请使用这个版本:https://github.com/fyabc/vllm/tree/add_qwen2_vl_new

@azuercici
Copy link

@xyfZzz @docShen @Potato-wll 您好,这应当是transformers最新版本的一个bug,我已经提交了相关issue,目前请先使用如下方式安装没有bug的版本:

pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

你好,我安装了你指定的这个版本的transformers还是有这个问题,请问在config.json中增加factor具体是怎么加呢?
"rope_scaling": {
"mrope_section": [
16,
24,
24
],
"type": "mrope"
加在什么位置呢?

@fyabc
Copy link
Collaborator

fyabc commented Sep 12, 2024

@xyfZzz @docShen @Potato-wll 您好,这应当是transformers最新版本的一个bug,我已经提交了相关issue,目前请先使用如下方式安装没有bug的版本:

pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

你好,我安装了你指定的这个版本的transformers还是有这个问题,请问在config.json中增加factor具体是怎么加呢? "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "type": "mrope" 加在什么位置呢?

@azuercici 可以修改如下

{
  ...
  "rope_scaling": {
    "type": "mrope",
    "factor": 1,
    "mrope_section": [
      16,
      24,
      24
    ]
  },
}

@EricHuiK
Copy link

配置文件和vllm我都修改了但是还是报错:: NameError: name 'rod_scaling' is not defined

@fyabc
Copy link
Collaborator

fyabc commented Sep 14, 2024

配置文件和vllm我都修改了但是还是报错:: NameError: name 'rod_scaling' is not defined

注意名称应当是'rope_scaling'而非'rod_scaling'

@zhangfan-algo
Copy link

@xyfZzz @docShen @Potato-wll 您好,这应当是transformers最新版本的一个bug,我已经提交了相关issue,目前请先使用如下方式安装没有bug的版本:

pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

你好,我安装了你指定的这个版本的transformers还是有这个问题,请问在config.json中增加factor具体是怎么加呢? "rope_scaling": { "mrope_section": [ 16, 24, 24 ], "type": "mrope" 加在什么位置呢?

目前使用这个版本的transformer还是会报原来的错误

@zhangfan-algo
Copy link

image

@imkero
Copy link

imkero commented Sep 25, 2024

vLLM 初始化时传入 rope_scaling 参数覆盖原有 config,可以临时解决。

llm = LLM(
    model=model_dir,
    rope_scaling={
        "type": "mrope",
        "mrope_section": [
            16,
            24,
            24
        ],
    },
)

@fyabc fyabc closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests