KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

Soumendraprasad · 2023-11-11T07:00:25Z

When I try to inference my finetuned code llama model using vllm, getting this error
File "/usr/local/lib/python3.9/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method return executor(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/vllm/worker/worker.py", line 70, in init_model self.model = get_model(self.model_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader.py", line 103, in get_model model.load_weights(model_config.model, model_config.download_dir, File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/llama.py", line 367, in load_weights param = state_dict[name.replace(weight_name, "qkv_proj")] KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight'

Some Shape Of my models
`

model.layers.0.input_layernorm.weight	[4096]	BF16
model.layers.0.mlp.down_proj.weight	[4096,11008]	BF16
model.layers.0.mlp.gate_proj.weight	[11008,4096]	BF16
model.layers.0.mlp.up_proj.weight	[11008,4096]	BF16
model.layers.0.post_attention_layernorm.weight	[4096]	BF16
model.layers.0.self_attn.k_proj.weight	[4096,4096]	BF16
model.layers.0.self_attn.o_proj.weight	[4096,4096]	BF16
model.layers.0.self_attn.q_proj.weight	[4096,4096]	BF16
model.layers.0.self_attn.v_proj.weight	[4096,4096]	BF16

`

Any Suggestion or help is highly appreciated.

The text was updated successfully, but these errors were encountered:

simon-mo · 2023-11-15T19:37:38Z

Currently vLLM does not support merging LoRA weights. Hence the model loader is erroring. Contribution strongly welcomed here! Ideally you can apply the LoRA weights automatically on the model loading process. I believe this PR does what you want: #289

I'm closing this PR in favor of #182

SuperBruceJia · 2023-11-21T20:30:10Z

Also facing this problem!

  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 93, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 231, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 110, in __init__
    self._init_workers(distributed_init_method)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 142, in _init_workers
    self._run_workers(
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 700, in _run_workers
    output = executor(*args, **kwargs)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/worker/worker.py", line 70, in init_model
    self.model = get_model(self.model_config)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/model_executor/model_loader.py", line 98, in get_model
    model.load_weights(model_config.model, model_config.download_dir,
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/model_executor/models/llama.py", line 322, in load_weights
    param = params_dict[name.replace(weight_name, param_name)]
KeyError: 'model.layers.0.self_attn.qkv_proj.base_layer.weight'

Soumendraprasad · 2023-12-06T14:33:17Z

@SuperBruceJia , did you found any other methods to tackle this error ?

SuperBruceJia · 2023-12-06T15:16:37Z

@SuperBruceJia , did you found any other methods to tackle this error ?

Yes!

git clone --branch support_peft https://github.com/SuperBruceJia/vllm.git
cd vllm
pip install -e . --user

Then,

import gc

from vllm import LLM, SamplingParams
from vllm.model_executor.adapters import lora
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model
save_dir = "YOUR_PATH"
llm = LLM(model=model_name, download_dir=save_dir_llm, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)
lora.LoRAModel.from_pretrained(llm.llm_engine.workers[0].model, save_dir)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

If you use some models that need Hugging Face login:

from huggingface_hub import login

login(token="YOUR_HUGGING_FACE_TOKEN")

SuperBruceJia · 2023-12-06T15:27:42Z

@SuperBruceJia , did you found any other methods to tackle this error ?

Generally speaking, you are suggested to use the solution mentioned here.

Load the pre-trained model and merge the LoRA weights.

Soumendraprasad · 2023-12-06T17:41:08Z

While setting the dependencies by pip install -e . --user , getting error `error: subprocess-exited-with-error
× Building editable for vllm (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [125 lines of output]
/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
running editable_wheel
creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info
writing /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/PKG-INFO
writing dependency_links to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/dependency_links.txt
writing requirements to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/requires.txt
writing top-level names to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/top_level.txt
writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt'
reading manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt'
creating '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info'
creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info/WHEEL
running build_py
running build_ext
Traceback (most recent call last):
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run
self._create_wheel_file(bdist_wheel)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file
files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands
self._run_build_subcommands()
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands
self.run_command(name)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
super().run_command(command)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
_build_ext.run(self)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.

  /var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
  !!

          ********************************************************************************
          An error happened while installing `vllm` in editable mode.

          The following steps are recommended to help debug this problem:

          - Try to install the project normally, without using the editable mode.
            Does the error still persist?
            (If it does, try fixing the problem before attempting the editable mode).
          - If you are using binary extensions, make sure you have all OS-level
            dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
          - Try the latest version of setuptools (maybe the error was already fixed).
          - If you (or your project dependencies) are using any setuptools extension
            or customization, make sure they support the editable mode.

          After following the steps above, if the problem still persists and
          you think this is related to how setuptools handles editable installations,
          please submit a reproducible example
          (see https://stackoverflow.com/help/minimal-reproducible-example) to:

              https://github.com/pypa/setuptools/issues

          See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
          ********************************************************************************

  !!
    cmd_obj.run()
  Traceback (most recent call last):
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
      return hook(wheel_directory, config_settings, metadata_directory)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 436, in build_editable
      return self._build_with_temp_dir(
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
      self.run_setup()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 145, in <module>
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run
      self._create_wheel_file(bdist_wheel)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands
      self._run_build_subcommands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands
      self.run_command(name)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
      _build_ext.run(self)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
      _check_cuda_version(compiler_name, compiler_version)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
      raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
  RuntimeError:
  The detected CUDA version (11.8) mismatches the version that was used to compile
  PyTorch (12.1). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building editable for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects` .
Similar error can be found here .

@SuperBruceJia , I want to use vllm for inferencing a fine-tuned Codellama base model .
Below is the code to inference base Codellama model

from transformers import AutoTokenizer
import transformers
import torch

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")
pipeline = transformers.pipeline(
    "text-generation",
    model="codellama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'def fibonacci(',
    do_sample=True,
    temperature=0.2,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=100,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Could you tell me how use vllm to infer Codellama . Your proposed solution is giving me above error . Any help is highly appreciated .

SuperBruceJia · 2023-12-06T18:19:15Z

RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.

"RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions."

Please note that the error is triggered by the CUDA version.

You may get some help from this issue and solution.

SuperBruceJia · 2023-12-06T18:20:49Z

While setting the dependencies by pip install -e . --user , getting error `error: subprocess-exited-with-error × Building editable for vllm (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [125 lines of output] /var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), running editable_wheel creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info writing /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/PKG-INFO writing dependency_links to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/dependency_links.txt writing requirements to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/requires.txt writing top-level names to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/top_level.txt writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt' reading manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' adding license file 'LICENSE' writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt' creating '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info' creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info/WHEEL running build_py running build_ext Traceback (most recent call last): File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run self._create_wheel_file(bdist_wheel) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands self._run_build_subcommands() File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands self.run_command(name) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command super().run_command(command) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions _check_cuda_version(compiler_name, compiler_version) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (11.8) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.
  /var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
  !!

          ********************************************************************************
          An error happened while installing `vllm` in editable mode.

          The following steps are recommended to help debug this problem:

          - Try to install the project normally, without using the editable mode.
            Does the error still persist?
            (If it does, try fixing the problem before attempting the editable mode).
          - If you are using binary extensions, make sure you have all OS-level
            dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
          - Try the latest version of setuptools (maybe the error was already fixed).
          - If you (or your project dependencies) are using any setuptools extension
            or customization, make sure they support the editable mode.

          After following the steps above, if the problem still persists and
          you think this is related to how setuptools handles editable installations,
          please submit a reproducible example
          (see https://stackoverflow.com/help/minimal-reproducible-example) to:

              https://github.com/pypa/setuptools/issues

          See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
          ********************************************************************************

  !!
    cmd_obj.run()
  Traceback (most recent call last):
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
      return hook(wheel_directory, config_settings, metadata_directory)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 436, in build_editable
      return self._build_with_temp_dir(
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
      self.run_setup()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 145, in <module>
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run
      self._create_wheel_file(bdist_wheel)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands
      self._run_build_subcommands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands
      self.run_command(name)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
      _build_ext.run(self)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
      _check_cuda_version(compiler_name, compiler_version)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
      raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
  RuntimeError:
  The detected CUDA version (11.8) mismatches the version that was used to compile
  PyTorch (12.1). Please make sure to use the same CUDA versions.

  [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building editable for vllm Failed to build vllm ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects` . Similar error can be found here .

@SuperBruceJia , I want to use vllm for inferencing a fine-tuned Codellama base model . Below is the code to inference base Codellama model
from transformers import AutoTokenizer
import transformers
import torch

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")
pipeline = transformers.pipeline(
    "text-generation",
    model="codellama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'def fibonacci(',
    do_sample=True,
    temperature=0.2,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=100,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")
Could you tell me how use vllm to infer Codellama . Your proposed solution is giving me above error . Any help is highly appreciated .

Please fix the problem of the CUDA version and PyTorch version. You may uninstall and re-install PyTorch.

Soumendraprasad · 2023-12-08T18:47:55Z

@SuperBruceJia , I fixed the CUDA version & PyTorch version issue . The problem was my PyTorch was having 2.1.0+cu121 this version but my vm having cuda 11.8 so when I changed my PyTorch version to 2.1.0+cu118 issue resolved . Now I am have setted environment . But now If I am running your given script , In line
llm = LLM(model=model_name, download_dir=save_dir_llm, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

getting error NameError: name is not defined. I am already logged in my hugging face account . Then I load all the model files to my vm then try to use that folder where all files present , still got the error . How to fix this . Let say I want to infer model codellama/CodeLlama-7b-Python-hf from hugging face . What should I do to infer it via your proposed modified vllm .

aiMBF · 2024-02-15T13:16:35Z

Hello everyone. Can I get help? I'm using inference on phi2 customed model and for inference I'm using MODAL and VLLM and I'm getting : base_model.model.model.layers.0.mlp.fc2.lora_A.weight @Soumendraprasad @SuperBruceJia

SuperBruceJia · 2024-02-15T13:59:51Z

Hello everyone. Can I get help? I'm using inference on phi2 customed model and for inference I'm using MODAL and VLLM and I'm getting : base_model.model.model.layers.0.mlp.fc2.lora_A.weight @Soumendraprasad @SuperBruceJia

The supported targets are only limited to

target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
        ],

Iven2132 · 2024-04-22T17:21:20Z

Hi, @SuperBruceJia I'm also getting this error: KeyError: 'base_model.model.lm_head.base_layer.weight' Can you please help?

Here is my notebook: https://colab.research.google.com/drive/1hYdz4JYFuqzMM3pKFvsgH2ZMMc6KSy_y?usp=sharing

and here is my model: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft

SuperBruceJia · 2024-04-22T17:28:43Z

Hi, @SuperBruceJia I'm also getting this error: KeyError: 'base_model.model.lm_head.base_layer.weight' Can you please help?

Here is my notebook: https://colab.research.google.com/drive/1hYdz4JYFuqzMM3pKFvsgH2ZMMc6KSy_y?usp=sharing

and here is my model: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft

It seems that the repository only contains an adapter:
https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/tree/main.

You need to load the base model first:
https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L7

And then load the adapter:
https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L21-L42

You may need the LoRA for vLLM:
https://docs.vllm.ai/en/latest/models/lora.html

Iven2132 · 2024-04-22T17:34:51Z

Hi, @SuperBruceJia I'm also getting this error: KeyError: 'base_model.model.lm_head.base_layer.weight' Can you please help?
Here is my notebook: https://colab.research.google.com/drive/1hYdz4JYFuqzMM3pKFvsgH2ZMMc6KSy_y?usp=sharing
and here is my model: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft

It seems that the repository only contains an adapter: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/tree/main.

You need to load the base model first: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L7

And then load the adapter: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L21-L42

You may need the LoRA for vLLM: https://docs.vllm.ai/en/latest/models/lora.html

I'm confused, How that will work? Can you give me an example?

SuperBruceJia · 2024-04-22T17:37:29Z

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

(1) You first load the based model
(2) Then, load the LoRA adapter
(3) Finally, merge the base model with LoRA adapter (by vLLM internal codes)
(4) Run the llm.generate

Iven2132 · 2024-04-22T17:40:23Z

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html
(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the `llm.generate`

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

SuperBruceJia · 2024-04-22T17:48:37Z

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815

And your adapter's rank is 64:
https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22

So, I think it should work.

Iven2132 · 2024-04-22T17:52:23Z

Got it, I will try this. But Thank you very very much 😊🙏

…

On Mon, 22 Apr 2024, 23:18 Shuyue Jia (Bruce Jia), ***@***.***> wrote: https://docs.vllm.ai/en/latest/models/lora.html Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html [image: image] <https://private-user-images.githubusercontent.com/31528604/324555956-bbf5ab4b-4667-4701-b300-a668b8e972e0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTM4MDc4NDgsIm5iZiI6MTcxMzgwNzU0OCwicGF0aCI6Ii8zMTUyODYwNC8zMjQ1NTU5NTYtYmJmNWFiNGItNDY2Ny00NzAxLWIzMDAtYTY2OGI4ZTk3MmUwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDIyVDE3MzkwOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNiMTEyYjZjNzY1NDZkMjRmMTYzMGZmODZhOTJkNDRkMWQwMmQzOTMyZDkwMjA3YTJjNzQzOTA1ZTU4ZjQ0NWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.OLv7uIYUHUJKjOc_FRHkF8UJeQ3IXVZzhD-QBicpFzk> (1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815 And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22 So, I think it should work. — Reply to this email directly, view it on GitHub <#1625 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3UMSYUK47O3PTQ2OHUS5RTY6VEQVAVCNFSM6AAAAAA7HBXDCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQGQYTIMJZG4> . You are receiving this because you commented.Message ID: ***@***.***>

Iven2132 · 2024-04-23T05:38:58Z

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815

And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22

So, I think it should work.

Hey, @SuperBruceJia I'm getting this error: AssertionError: To be tested: vision language model with LoRA settings., I think it's not supported yet.

Iven2132 · 2024-04-23T05:42:40Z

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815
And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22
So, I think it should work.

Hey, @SuperBruceJia I'm getting this error: AssertionError: To be tested: vision language model with LoRA settings., I think it's not supported yet.

Can we please have support for vision models? @ywang96

Iven2132 · 2024-04-24T14:21:27Z

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815

And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22

So, I think it should work.

I think we can't use fine-tuned VLM on vllm?

simon-mo closed this as completed Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

Soumendraprasad commented Nov 11, 2023 •

edited

Loading

simon-mo commented Nov 15, 2023

SuperBruceJia commented Nov 21, 2023

Soumendraprasad commented Dec 6, 2023

SuperBruceJia commented Dec 6, 2023

SuperBruceJia commented Dec 6, 2023

Soumendraprasad commented Dec 6, 2023 •

edited

Loading

SuperBruceJia commented Dec 6, 2023

SuperBruceJia commented Dec 6, 2023

Soumendraprasad commented Dec 8, 2023 •

edited

Loading

aiMBF commented Feb 15, 2024 •

edited

Loading

SuperBruceJia commented Feb 15, 2024

Iven2132 commented Apr 22, 2024

SuperBruceJia commented Apr 22, 2024

Iven2132 commented Apr 22, 2024

SuperBruceJia commented Apr 22, 2024

Iven2132 commented Apr 22, 2024

SuperBruceJia commented Apr 22, 2024

Iven2132 commented Apr 22, 2024 via email

Iven2132 commented Apr 23, 2024

Iven2132 commented Apr 23, 2024

Iven2132 commented Apr 24, 2024

KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

Comments

Soumendraprasad commented Nov 11, 2023 • edited Loading

simon-mo commented Nov 15, 2023

SuperBruceJia commented Nov 21, 2023

Soumendraprasad commented Dec 6, 2023

SuperBruceJia commented Dec 6, 2023

SuperBruceJia commented Dec 6, 2023

Soumendraprasad commented Dec 6, 2023 • edited Loading

SuperBruceJia commented Dec 6, 2023

SuperBruceJia commented Dec 6, 2023

Soumendraprasad commented Dec 8, 2023 • edited Loading

aiMBF commented Feb 15, 2024 • edited Loading

SuperBruceJia commented Feb 15, 2024

Iven2132 commented Apr 22, 2024

SuperBruceJia commented Apr 22, 2024

Iven2132 commented Apr 22, 2024

SuperBruceJia commented Apr 22, 2024

Iven2132 commented Apr 22, 2024

SuperBruceJia commented Apr 22, 2024

Iven2132 commented Apr 22, 2024 via email

Iven2132 commented Apr 23, 2024

Iven2132 commented Apr 23, 2024

Iven2132 commented Apr 24, 2024

Soumendraprasad commented Nov 11, 2023 •

edited

Loading

Soumendraprasad commented Dec 6, 2023 •

edited

Loading

Soumendraprasad commented Dec 8, 2023 •

edited

Loading

aiMBF commented Feb 15, 2024 •

edited

Loading