Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI/Build] build on empty device for better dev experience #4773

Merged
merged 13 commits into from
Aug 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions requirements-cuda.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ nvidia-ml-py # for pynvml package
torch == 2.4.0
# These must be updated alongside torch
torchvision == 0.19 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
xformers == 0.0.27.post2 # Requires PyTorch 2.4.0
vllm-flash-attn == 2.6.1 # Requires PyTorch 2.4.0
xformers == 0.0.27.post2; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch 2.4.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is because we take the CUDA requirements for the "empty" device wheel, and xformers and vllm-flash-attn are available only on Linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform_system == 'Linux' makes sense to me.

is platform_machine == 'x86_64' necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm-flash-attn have published wheels only for x86_64, and no published tar.gz - https://pypi.org/project/vllm-flash-attn/#files
xformers also has wheels only for 64bit machines. It does have a tar.gz but from what I found online it can't be installed on 32bit - https://pypi.org/project/xformers/#files

So I'm pretty sure it's needed for both

vllm-flash-attn == 2.6.1; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch 2.4.0
24 changes: 19 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,12 @@ def embed_commit_hash():

VLLM_TARGET_DEVICE = envs.VLLM_TARGET_DEVICE

# vLLM only supports Linux platform
assert sys.platform.startswith(
"linux"), "vLLM only supports Linux platform (including WSL)."
if not sys.platform.startswith("linux"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change actually makes it possible to install the published tar.gz on mac, by setting VLLM_TARGET_DEVICE to "empty". Also logging a warning about vLLM not actually being able to run.

logger.warning(
"vLLM only supports Linux platform (including WSL). "
"Building on %s, "
"so vLLM may not be able to run correctly", sys.platform)
VLLM_TARGET_DEVICE = "empty"

MAIN_CUDA_VERSION = "12.1"

Expand Down Expand Up @@ -231,6 +234,10 @@ def build_extensions(self) -> None:
subprocess.check_call(["cmake", *build_args], cwd=self.build_temp)


def _no_device() -> bool:
return VLLM_TARGET_DEVICE == "empty"


def _is_cuda() -> bool:
has_cuda = torch.version.cuda is not None
return (VLLM_TARGET_DEVICE == "cuda" and has_cuda
Expand Down Expand Up @@ -350,7 +357,9 @@ def find_version(filepath: str) -> str:
def get_vllm_version() -> str:
version = find_version(get_path("vllm", "version.py"))

if _is_cuda():
if _no_device():
version += "+empty"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually not sure if it is better to add "+empty" to the version or not.. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding "+empty" looks good to me.

elif _is_cuda():
cuda_version = str(get_nvcc_cuda_version())
if cuda_version != MAIN_CUDA_VERSION:
cuda_version_str = cuda_version.replace(".", "")[:3]
Expand Down Expand Up @@ -404,7 +413,9 @@ def _read_requirements(filename: str) -> List[str]:
resolved_requirements.append(line)
return resolved_requirements

if _is_cuda():
if _no_device():
requirements = _read_requirements("requirements-cuda.txt")
elif _is_cuda():
requirements = _read_requirements("requirements-cuda.txt")
cuda_major, cuda_minor = torch.version.cuda.split(".")
modified_requirements = []
Expand Down Expand Up @@ -453,6 +464,9 @@ def _read_requirements(filename: str) -> List[str]:
ext_modules = []
package_data["vllm"].append("*.so")

if _no_device():
ext_modules = []

setup(
name="vllm",
version=get_vllm_version(),
Expand Down
Loading