You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ python3 -m venv vicuna_venv
$ source ./vicuna_venv/bin/activate
(vicuna_venv)$ pip3 install "fschat[model_worker,webui]"
(vicuna_venv)$ python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --device mps --load-8bit
/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
0%| | 0/2 [00:32<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/fastchat/serve/cli.py", line 283, in <module>
main(args)
File "/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/fastchat/serve/cli.py", line 208, in main
chat_loop(
File "/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/fastchat/serve/inference.py", line 311, in chat_loop
model, tokenizer = load_model(
^^^^^^^^^^^
File "/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/fastchat/model/model_adapter.py", line 236, in load_model
model, tokenizer = adapter.load_compress_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/fastchat/model/model_adapter.py", line 82, in load_compress_model
return load_compress_model(
^^^^^^^^^^^^^^^^^^^^
File "/home/nico/prog/vicuna_venv/lib/python3.11/site-packages/fastchat/model/compression.py", line 187, in load_compress_model
compressed_state_dict[name] = tmp_state_dict[name].to(
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PyTorch is not linked with support for mps devices
I've also tried deleting ~/.cache/pip beforehand, but get the same error.
Starting with --device cpu (python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --device cpu --load-8bit) works.
The text was updated successfully, but these errors were encountered:
Didn't saw the More Platforms and Quantization section.
I got it running by following the instructions from #104 (comment) by installing the pytorch ROCm version and sudo pacman -S rocm-opencl-runtime.
My System:
The commands I've executed:
I've also tried deleting
~/.cache/pip
beforehand, but get the same error.Starting with
--device cpu
(python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3 --device cpu --load-8bit
) works.The text was updated successfully, but these errors were encountered: