Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI/Build] build on empty device for better dev experience #4773

Merged
merged 13 commits into from
Aug 11, 2024

Conversation

tomeras91
Copy link
Contributor

This PR enables build of a platform-agnostic wheel which is installable also on macos. The idea is to improve the dev-experience for creating projects that import and use vLLM.
Important: This wheel does not enable running of vllm on mac, but does allow to import it.

The PR doesn't entirely fix issues #212, #695, #1397, #1921, but it's a step forward.

Copy link
Contributor Author

@tomeras91 tomeras91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to hear your thoughts.. currently the PR just enables building of the platform agnostic wheel. What do you think about adding this build to publish.yml so it will be build on every release?

@@ -2,5 +2,4 @@
-r requirements-common.txt

# Dependencies for x86_64 CPUs
torch == 2.3.0+cpu
triton >= 2.2.0 # FIXME(woosuk): This is a hack to avoid import error.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dealt with triton import errors in code

@simon-mo
Copy link
Collaborator

simon-mo commented May 23, 2024

Interestingly Triton recently supported macOS with Apple Silicon: triton-lang/triton#3443

@tomeras91
Copy link
Contributor Author

Interestingly Triton recently supported macOS with Apple Silicon: triton-lang/triton#3443

Yep. But AFAIU they still release wheels only for Linux so it can't be pip installed directly from pypi. I guess building from source is the only option to get triton installed on mac.
https://pypi.org/project/triton/2.3.0/#files

@hugolytics
Copy link

this would be great!
maybe a warning at import could be a good idea, our team develops on mac, and we run the LLM workloads on a server.
But it would be great to be able to install the project locally for debugging some of the helper code without having to maintain separate deps.

@cdpierse
Copy link

cdpierse commented Aug 9, 2024

➕ 1 for this feature. It would be really nice to enable local development for code completion, and documentation references.

@youkaichao
Copy link
Member

I think this feature is useful, but can you use the existing infra without introducing another env var?

e.g. VLLM_TARGET_DEVICE="empty" python setup.py develop

@@ -7,5 +7,5 @@ nvidia-ml-py # for pynvml package
torch == 2.4.0
# These must be updated alongside torch
torchvision == 0.19 # Required for phi3v processor. See https://github.com/pytorch/vision?tab=readme-ov-file#installation for corresponding version
xformers == 0.0.27.post2 # Requires PyTorch 2.4.0
vllm-flash-attn == 2.6.1 # Requires PyTorch 2.4.0
xformers == 0.0.27.post2; platform_system == 'Linux' and platform_machine == 'x86_64' # Requires PyTorch 2.4.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is because we take the CUDA requirements for the "empty" device wheel, and xformers and vllm-flash-attn are available only on Linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform_system == 'Linux' makes sense to me.

is platform_machine == 'x86_64' necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm-flash-attn have published wheels only for x86_64, and no published tar.gz - https://pypi.org/project/vllm-flash-attn/#files
xformers also has wheels only for 64bit machines. It does have a tar.gz but from what I found online it can't be installed on 32bit - https://pypi.org/project/xformers/#files

So I'm pretty sure it's needed for both

@tomeras91
Copy link
Contributor Author

I think this feature is useful, but can you use the existing infra without introducing another env var?

e.g. VLLM_TARGET_DEVICE="empty" python setup.py develop

Thanks @youkaichao . Done now
also - the diff between this an main is very minimal due to the great work in #6786, solving the import issues with triton

# vLLM only supports Linux platform
assert sys.platform.startswith(
"linux"), "vLLM only supports Linux platform (including WSL)."
if not sys.platform.startswith("linux"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change actually makes it possible to install the published tar.gz on mac, by setting VLLM_TARGET_DEVICE to "empty". Also logging a warning about vLLM not actually being able to run.

@@ -350,7 +356,9 @@ def find_version(filepath: str) -> str:
def get_vllm_version() -> str:
version = find_version(get_path("vllm", "version.py"))

if _is_cuda():
if _no_device():
version += "+empty"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually not sure if it is better to add "+empty" to the version or not.. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding "+empty" looks good to me.

@tomeras91
Copy link
Contributor Author

@youkaichao - now this PR actually does 2 things:

  1. It enables a local platform agnostic build of vLLM with VLLM_TARGET_DEVICE="empty" python setup.py
  2. It actually allows build of the published tar.gz on macs. So now a developer working on mac can run pip install vllm and actually have vllm installed and importable (though not runnable)

IMO the second point is the nicer feature here. Really makes development using vLLM much easier on macs. I'm not sure though if you'd want to include it.. so if you think it's not a good idea, we can discard it and just leave the VLLM_TARGET_DEVICE="empty" option

@youkaichao
Copy link
Member

@tomeras91 this feature is only for dev and debugging. I don't think it make any sense to publish to pypi. Making VLLM_TARGET_DEVICE="empty" python setup.py work is enough.

@tomeras91
Copy link
Contributor Author

@tomeras91 this feature is only for dev and debugging. I don't think it make any sense to publish to pypi. Making VLLM_TARGET_DEVICE="empty" python setup.py work is enough.

I understand and agree.
I wasn't suggesting to publish the "empty" wheel to pypi. The change here just makes it possible to install the already published tar.gz on mac

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the great work! I manually verified that VLLM_TARGET_DEVICE="empty" pip install -vvv -e . works for macos now🎉

@youkaichao youkaichao changed the title [CI/Build] Platform agnostic wheel [CI/Build] build on empty device for better dev experience Aug 11, 2024
@youkaichao youkaichao merged commit 3860879 into vllm-project:main Aug 11, 2024
27 checks passed
@youkaichao
Copy link
Member

@tomeras91 can you add a followup pr for https://docs.vllm.ai/en/latest/getting_started/installation.html , to tell users how to use this feature?

@tomeras91
Copy link
Contributor Author

@tomeras91 can you add a followup pr for https://docs.vllm.ai/en/latest/getting_started/installation.html , to tell users how to use this feature?

Sure - here: #7403

@tomeras91 tomeras91 deleted the platform-agnostic-wheel branch August 12, 2024 15:00
sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024
fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024
Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants