`--kv-cache-dtype fp8_e5m2` requires official docker image to have `nvcc` #3028

ita9naiwa · 2024-02-25T05:48:30Z

CUDA_VISIBLE_DEVICES=0 python3 -m vllm.entrypoints.api_server \
                               --model=my_model
                               --tensor-parallel-size 1 \
                               --dtype float16 \
                               --kv-cache-dtype fp8_e5m2 \
                               --swap-space 32 \
                               --gpu-memory-utilization 0.95

yields

INFO 02-25 05:44:42 utils.py:188] CUDA_HOME is not found in the environment. Using /usr/local/cuda as CUDA_HOME.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/api_server.py", line 90, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 617, in from_engine_args
    engine_configs = engine_args.create_engine_configs()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 279, in create_engine_configs
    cache_config = CacheConfig(self.block_size,
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 296, in __init__
    self._verify_cache_dtype()
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 312, in _verify_cache_dtype
    nvcc_cuda_version = get_nvcc_cuda_version()
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 191, in get_nvcc_cuda_version
    nvcc_output = subprocess.check_output([cuda_home + "/bin/nvcc", "-V"],
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'

this error in vllm==v0.3.0.

The text was updated successfully, but these errors were encountered:

zhaoyang-star · 2024-02-27T14:00:12Z

It was fixed in #2781 Please update codebase to latest main branch.

ita9naiwa · 2024-02-27T14:48:34Z

thanks!

ita9naiwa closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`--kv-cache-dtype fp8_e5m2` requires official docker image to have `nvcc` #3028

`--kv-cache-dtype fp8_e5m2` requires official docker image to have `nvcc` #3028

ita9naiwa commented Feb 25, 2024

zhaoyang-star commented Feb 27, 2024

ita9naiwa commented Feb 27, 2024

--kv-cache-dtype fp8_e5m2 requires official docker image to have nvcc #3028

--kv-cache-dtype fp8_e5m2 requires official docker image to have nvcc #3028

Comments

ita9naiwa commented Feb 25, 2024

zhaoyang-star commented Feb 27, 2024

ita9naiwa commented Feb 27, 2024

`--kv-cache-dtype fp8_e5m2` requires official docker image to have `nvcc` #3028

`--kv-cache-dtype fp8_e5m2` requires official docker image to have `nvcc` #3028