Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--kv-cache-dtype fp8_e5m2 requires official docker image to have nvcc #3028

Closed
ita9naiwa opened this issue Feb 25, 2024 · 2 comments
Closed

Comments

@ita9naiwa
Copy link
Contributor

CUDA_VISIBLE_DEVICES=0 python3 -m vllm.entrypoints.api_server \
                               --model=my_model
                               --tensor-parallel-size 1 \
                               --dtype float16 \
                               --kv-cache-dtype fp8_e5m2 \
                               --swap-space 32 \
                               --gpu-memory-utilization 0.95

yields

INFO 02-25 05:44:42 utils.py:188] CUDA_HOME is not found in the environment. Using /usr/local/cuda as CUDA_HOME.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/api_server.py", line 90, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 617, in from_engine_args
    engine_configs = engine_args.create_engine_configs()
  File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 279, in create_engine_configs
    cache_config = CacheConfig(self.block_size,
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 296, in __init__
    self._verify_cache_dtype()
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 312, in _verify_cache_dtype
    nvcc_cuda_version = get_nvcc_cuda_version()
  File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 191, in get_nvcc_cuda_version
    nvcc_output = subprocess.check_output([cuda_home + "/bin/nvcc", "-V"],
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'

this error in vllm==v0.3.0.

@zhaoyang-star
Copy link
Contributor

It was fixed in #2781 Please update codebase to latest main branch.

@ita9naiwa
Copy link
Contributor Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants