Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hardware][Metal] Apple Metal support #12640

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

skyzh
Copy link

@skyzh skyzh commented Feb 1, 2025

fix #2081

This patch makes some parts of vllm run with Apple Metal by using the PyTorch MPS fallback mode (see build_and_run.sh), which ensures that PyTorch operators can run natively on MPS while other operations run on CPU. Though the framework runs end-to-end and produces texts, it's not producing a sensible result:

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-0.5B-Instruct",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
    }'
{"id":"cmpl-aba9771f627e400fab7be25d8b310fd5","object":"text_completion","created":1738386309,"model":"Qwen/Qwen2.5-0.5B-Instruct","choices":[{"index":0,"text":"!!!!!!!","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":4,"total_tokens":11,"completion_tokens":7,"prompt_tokens_details":null}}%

...which needs further debugging, and comments welcomed -- I have no idea what's going on there.

For a full Metal support, we would have to implement all current CUDA kernels with Metal, which will be a lot of work. So this patch is the very first step before we have full Metal support.

In general, the patch assumes the Metal platform is a CPU-based platform (i.e., using CPU workers) with PyTorch MPS backend. This can also be improved in the future, for example, using the GPU scheduler for Metal.

skyzh added 2 commits January 31, 2025 19:04
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Copy link

github-actions bot commented Feb 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inquiry Regarding vLLM Support for Mac Metal API
1 participant