`ValueError`: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282

wasertech · 2023-10-07T06:48:38Z

AutoAWQ states that in order to use AWQ, you need a GPU with:

Compute Capability 7.5 (sm75). Turing and later architectures are supported.

But when I try to use vLLM to serve my AWQ LLM:

+ python app.py --host 0.0.0.0 --port 5085 --model wasertech/assistant-llama2-7b-chat-awq --tokenizer hf-internal-testing/llama-tokenizer --dtype half --tensor-parallel-size 1 --gpu-memory-utilization 0.65 --quantization awq
Downloading (…)lve/main/config.json: 100%|███████| 677/677 [00:00<00:00, 118kB/s]
INFO 10-07 06:41:25 llm_engine.py:72] Initializing an LLM engine with config: model='wasertech/assistant-llama2-7b-chat-awq', tokenizer='hf-internal-testing/llama-tokenizer', tokenizer_mode=auto, revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=awq, seed=0)
Downloading (…)cial_tokens_map.json: 100%|████| 72.0/72.0 [00:00<00:00, 14.2kB/s]
Downloading (…)e6/added_tokens.json: 100%|████| 42.0/42.0 [00:00<00:00, 8.29kB/s]
Downloading (…)okenizer_config.json: 100%|██████| 825/825 [00:00<00:00, 82.4kB/s]
Downloading (…)e6/quant_config.json: 100%|████| 90.0/90.0 [00:00<00:00, 15.4kB/s]
Downloading (…)neration_config.json: 100%|██████| 132/132 [00:00<00:00, 22.2kB/s]
Downloading (…)44be6/tokenizer.json: 100%|██| 1.84M/1.84M [00:00<00:00, 4.09MB/s]
Traceback (most recent call last):
  File "app.py", line 86, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 486, in from_engine_args
    engine = cls(engine_args.worker_use_ray,
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 270, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 306, in _init_engine
    return engine_class(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 108, in __init__
    self._init_workers(distributed_init_method)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 140, in _init_workers
    self._run_workers(
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 692, in _run_workers
    output = executor(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 68, in init_model
    self.model = get_model(self.model_config)
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/model_loader.py", line 75, in get_model
    raise ValueError(
ValueError: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75.

Please lower the requirements accordingly.

The text was updated successfully, but these errors were encountered:

casper-hansen · 2023-10-07T13:08:48Z

#1252 needs to be merged to resolve this. I added support separately based on the PR

WoosukKwon · 2023-10-11T08:14:20Z

This issue was fixed by #1252

wasertech mentioned this issue Oct 7, 2023

Loading quantized models #392

Closed

WoosukKwon closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ValueError`: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282

`ValueError`: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282

wasertech commented Oct 7, 2023

casper-hansen commented Oct 7, 2023

WoosukKwon commented Oct 11, 2023

ValueError: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282

ValueError: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282

Comments

wasertech commented Oct 7, 2023

casper-hansen commented Oct 7, 2023

WoosukKwon commented Oct 11, 2023

`ValueError`: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282

`ValueError`: The quantization method awq is not supported for the current GPU. Minimum capability: 80. Current capability: 75. #1282