Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight' #1625

Closed
Soumendraprasad opened this issue Nov 11, 2023 · 21 comments

Comments

@Soumendraprasad
Copy link

Soumendraprasad commented Nov 11, 2023

When I try to inference my finetuned code llama model using vllm, getting this error
File "/usr/local/lib/python3.9/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method return executor(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/vllm/worker/worker.py", line 70, in init_model self.model = get_model(self.model_config) File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/model_loader.py", line 103, in get_model model.load_weights(model_config.model, model_config.download_dir, File "/usr/local/lib/python3.9/dist-packages/vllm/model_executor/models/llama.py", line 367, in load_weights param = state_dict[name.replace(weight_name, "qkv_proj")] KeyError: 'base_model.model.model.layers.0.self_attn.qkv_proj.lora_A.weight'
vllm
Some Shape Of my models
`

model.layers.0.input_layernorm.weight [4096] BF16
model.layers.0.mlp.down_proj.weight [4096,11008] BF16
model.layers.0.mlp.gate_proj.weight [11008,4096] BF16
model.layers.0.mlp.up_proj.weight [11008,4096] BF16
model.layers.0.post_attention_layernorm.weight [4096] BF16
model.layers.0.self_attn.k_proj.weight [4096,4096] BF16
model.layers.0.self_attn.o_proj.weight [4096,4096] BF16
model.layers.0.self_attn.q_proj.weight [4096,4096] BF16
model.layers.0.self_attn.v_proj.weight [4096,4096] BF16
`

Any Suggestion or help is highly appreciated.

@simon-mo
Copy link
Collaborator

Currently vLLM does not support merging LoRA weights. Hence the model loader is erroring. Contribution strongly welcomed here! Ideally you can apply the LoRA weights automatically on the model loading process. I believe this PR does what you want: #289

I'm closing this PR in favor of #182

@SuperBruceJia
Copy link

Also facing this problem!

  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/entrypoints/llm.py", line 93, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 231, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 110, in __init__
    self._init_workers(distributed_init_method)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 142, in _init_workers
    self._run_workers(
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/engine/llm_engine.py", line 700, in _run_workers
    output = executor(*args, **kwargs)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/worker/worker.py", line 70, in init_model
    self.model = get_model(self.model_config)
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/model_executor/model_loader.py", line 98, in get_model
    model.load_weights(model_config.model, model_config.download_dir,
  File "/usr4/ec523/brucejia/.local/lib/python3.8/site-packages/vllm/model_executor/models/llama.py", line 322, in load_weights
    param = params_dict[name.replace(weight_name, param_name)]
KeyError: 'model.layers.0.self_attn.qkv_proj.base_layer.weight'

@Soumendraprasad
Copy link
Author

@SuperBruceJia , did you found any other methods to tackle this error ?

@SuperBruceJia
Copy link

@SuperBruceJia , did you found any other methods to tackle this error ?

Yes!

git clone --branch support_peft https://github.com/SuperBruceJia/vllm.git
cd vllm
pip install -e . --user

Then,

import gc

from vllm import LLM, SamplingParams
from vllm.model_executor.adapters import lora
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model
save_dir = "YOUR_PATH"
llm = LLM(model=model_name, download_dir=save_dir_llm, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)
lora.LoRAModel.from_pretrained(llm.llm_engine.workers[0].model, save_dir)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

If you use some models that need Hugging Face login:

from huggingface_hub import login

login(token="YOUR_HUGGING_FACE_TOKEN")

@SuperBruceJia
Copy link

@SuperBruceJia , did you found any other methods to tackle this error ?

Generally speaking, you are suggested to use the solution mentioned here.

Load the pre-trained model and merge the LoRA weights.

@Soumendraprasad
Copy link
Author

Soumendraprasad commented Dec 6, 2023

While setting the dependencies by pip install -e . --user , getting error `error: subprocess-exited-with-error
× Building editable for vllm (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [125 lines of output]
/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
running editable_wheel
creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info
writing /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/PKG-INFO
writing dependency_links to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/dependency_links.txt
writing requirements to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/requires.txt
writing top-level names to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/top_level.txt
writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt'
reading manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE'
writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt'
creating '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info'
creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info/WHEEL
running build_py
running build_ext
Traceback (most recent call last):
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run
self._create_wheel_file(bdist_wheel)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file
files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands
self._run_build_subcommands()
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands
self.run_command(name)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
super().run_command(command)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
_build_ext.run(self)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.

  /var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
  !!

          ********************************************************************************
          An error happened while installing `vllm` in editable mode.

          The following steps are recommended to help debug this problem:

          - Try to install the project normally, without using the editable mode.
            Does the error still persist?
            (If it does, try fixing the problem before attempting the editable mode).
          - If you are using binary extensions, make sure you have all OS-level
            dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
          - Try the latest version of setuptools (maybe the error was already fixed).
          - If you (or your project dependencies) are using any setuptools extension
            or customization, make sure they support the editable mode.

          After following the steps above, if the problem still persists and
          you think this is related to how setuptools handles editable installations,
          please submit a reproducible example
          (see https://stackoverflow.com/help/minimal-reproducible-example) to:

              https://github.com/pypa/setuptools/issues

          See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
          ********************************************************************************

  !!
    cmd_obj.run()
  Traceback (most recent call last):
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
      return hook(wheel_directory, config_settings, metadata_directory)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 436, in build_editable
      return self._build_with_temp_dir(
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
      self.run_setup()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 145, in <module>
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run
      self._create_wheel_file(bdist_wheel)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands
      self._run_build_subcommands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands
      self.run_command(name)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
      _build_ext.run(self)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
      _check_cuda_version(compiler_name, compiler_version)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
      raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
  RuntimeError:
  The detected CUDA version (11.8) mismatches the version that was used to compile
  PyTorch (12.1). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building editable for vllm
Failed to build vllm
ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects` .
Similar error can be found here .

@SuperBruceJia , I want to use vllm for inferencing a fine-tuned Codellama base model .
Below is the code to inference base Codellama model

from transformers import AutoTokenizer
import transformers
import torch

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")
pipeline = transformers.pipeline(
    "text-generation",
    model="codellama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'def fibonacci(',
    do_sample=True,
    temperature=0.2,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=100,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Could you tell me how use vllm to infer Codellama . Your proposed solution is giving me above error . Any help is highly appreciated .

@SuperBruceJia
Copy link

RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.

"RuntimeError:
The detected CUDA version (11.8) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions."

Please note that the error is triggered by the CUDA version.

You may get some help from this issue and solution.

@SuperBruceJia
Copy link

While setting the dependencies by pip install -e . --user , getting error `error: subprocess-exited-with-error × Building editable for vllm (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [125 lines of output] /var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), running editable_wheel creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info writing /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/PKG-INFO writing dependency_links to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/dependency_links.txt writing requirements to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/requires.txt writing top-level names to /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/top_level.txt writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt' reading manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' adding license file 'LICENSE' writing manifest file '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm.egg-info/SOURCES.txt' creating '/var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info' creating /var/tmp/pip-wheel-eybkk3v0/.tmp-9hphfyi9/vllm-0.1.2.dist-info/WHEEL running build_py running build_ext Traceback (most recent call last): File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run self._create_wheel_file(bdist_wheel) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands self._run_build_subcommands() File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands self.run_command(name) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command super().run_command(command) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions _check_cuda_version(compiler_name, compiler_version) File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (11.8) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.

  /var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
  !!

          ********************************************************************************
          An error happened while installing `vllm` in editable mode.

          The following steps are recommended to help debug this problem:

          - Try to install the project normally, without using the editable mode.
            Does the error still persist?
            (If it does, try fixing the problem before attempting the editable mode).
          - If you are using binary extensions, make sure you have all OS-level
            dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
          - Try the latest version of setuptools (maybe the error was already fixed).
          - If you (or your project dependencies) are using any setuptools extension
            or customization, make sure they support the editable mode.

          After following the steps above, if the problem still persists and
          you think this is related to how setuptools handles editable installations,
          please submit a reproducible example
          (see https://stackoverflow.com/help/minimal-reproducible-example) to:

              https://github.com/pypa/setuptools/issues

          See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
          ********************************************************************************

  !!
    cmd_obj.run()
  Traceback (most recent call last):
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
      return hook(wheel_directory, config_settings, metadata_directory)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 436, in build_editable
      return self._build_with_temp_dir(
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
      self.run_setup()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 145, in <module>
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 156, in run
      self._create_wheel_file(bdist_wheel)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 345, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 268, in _run_build_commands
      self._run_build_subcommands()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/editable_wheel.py", line 295, in _run_build_subcommands
      self.run_command(name)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
      _build_ext.run(self)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 525, in build_extensions
      _check_cuda_version(compiler_name, compiler_version)
    File "/var/tmp/pip-build-env-xc4n5tmk/overlay/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 413, in _check_cuda_version
      raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
  RuntimeError:
  The detected CUDA version (11.8) mismatches the version that was used to compile
  PyTorch (12.1). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building editable for vllm Failed to build vllm ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects` . Similar error can be found here .

@SuperBruceJia , I want to use vllm for inferencing a fine-tuned Codellama base model . Below is the code to inference base Codellama model

from transformers import AutoTokenizer
import transformers
import torch

tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")
pipeline = transformers.pipeline(
    "text-generation",
    model="codellama/CodeLlama-7b-hf",
    torch_dtype=torch.float16,
    device_map="auto",
)

sequences = pipeline(
    'def fibonacci(',
    do_sample=True,
    temperature=0.2,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=100,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Could you tell me how use vllm to infer Codellama . Your proposed solution is giving me above error . Any help is highly appreciated .

Please fix the problem of the CUDA version and PyTorch version. You may uninstall and re-install PyTorch.

@Soumendraprasad
Copy link
Author

Soumendraprasad commented Dec 8, 2023

@SuperBruceJia , I fixed the CUDA version & PyTorch version issue . The problem was my PyTorch was having 2.1.0+cu121 this version but my vm having cuda 11.8 so when I changed my PyTorch version to 2.1.0+cu118 issue resolved . Now I am have setted environment . But now If I am running your given script , In line
llm = LLM(model=model_name, download_dir=save_dir_llm, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

getting error NameError: name is not defined. I am already logged in my hugging face account . Then I load all the model files to my vm then try to use that folder where all files present , still got the error . How to fix this . Let say I want to infer model codellama/CodeLlama-7b-Python-hf from hugging face . What should I do to infer it via your proposed modified vllm .

@aiMBF
Copy link

aiMBF commented Feb 15, 2024

Hello everyone. Can I get help? I'm using inference on phi2 customed model and for inference I'm using MODAL and VLLM and I'm getting : base_model.model.model.layers.0.mlp.fc2.lora_A.weight @Soumendraprasad @SuperBruceJia

@SuperBruceJia
Copy link

Hello everyone. Can I get help? I'm using inference on phi2 customed model and for inference I'm using MODAL and VLLM and I'm getting : base_model.model.model.layers.0.mlp.fc2.lora_A.weight @Soumendraprasad @SuperBruceJia

The supported targets are only limited to

target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
        ],

@Iven2132
Copy link

Hi, @SuperBruceJia I'm also getting this error: KeyError: 'base_model.model.lm_head.base_layer.weight' Can you please help?

Here is my notebook: https://colab.research.google.com/drive/1hYdz4JYFuqzMM3pKFvsgH2ZMMc6KSy_y?usp=sharing

and here is my model: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft

@SuperBruceJia
Copy link

@Iven2132
Copy link

Hi, @SuperBruceJia I'm also getting this error: KeyError: 'base_model.model.lm_head.base_layer.weight' Can you please help?
Here is my notebook: https://colab.research.google.com/drive/1hYdz4JYFuqzMM3pKFvsgH2ZMMc6KSy_y?usp=sharing
and here is my model: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft

It seems that the repository only contains an adapter: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/tree/main.

You need to load the base model first: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L7

And then load the adapter: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L21-L42

You may need the LoRA for vLLM: https://docs.vllm.ai/en/latest/models/lora.html

I'm confused, How that will work? Can you give me an example?

@SuperBruceJia
Copy link

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

image

(1) You first load the based model
(2) Then, load the LoRA adapter
(3) Finally, merge the base model with LoRA adapter (by vLLM internal codes)
(4) Run the llm.generate

@Iven2132
Copy link

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html

image (1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the `llm.generate`

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

@SuperBruceJia
Copy link

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html
image
(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815

And your adapter's rank is 64:
https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22

So, I think it should work.

@Iven2132
Copy link

Iven2132 commented Apr 22, 2024 via email

@Iven2132
Copy link

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html
image
(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815

And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22

So, I think it should work.

Hey, @SuperBruceJia I'm getting this error: AssertionError: To be tested: vision language model with LoRA settings., I think it's not supported yet.

@Iven2132
Copy link

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html
image
(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815
And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22
So, I think it should work.

Hey, @SuperBruceJia I'm getting this error: AssertionError: To be tested: vision language model with LoRA settings., I think it's not supported yet.

Can we please have support for vision models? @ywang96

@Iven2132
Copy link

https://docs.vllm.ai/en/latest/models/lora.html

Please check this simple example: https://docs.vllm.ai/en/latest/models/lora.html
image
(1) You first load the based model (2) Then, load the LoRA adapter (3) Finally, merge the base model with LoRA adapter (by vLLM internal codes) (4) Run the llm.generate

Ok, so first I need to download the model from Huggingface hub and use llava model in the llm class and set enable lora to true

Since there is a max lora rank in the current vLLM + LoRA implementations, the max LoRA r is 64: https://github.com/vllm-project/vllm/blob/main/vllm/config.py#L809-L815

And your adapter's rank is 64: https://huggingface.co/marksuccsmfewercoc/llava-1.5-7b-hf-ft-mix-vsft/blob/main/adapter_config.json#L22

So, I think it should work.

I think we can't use fine-tuned VLM on vllm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants