[Hardware][Metal] Apple Metal support #12640
Draft
+186
−36
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix #2081
This patch makes some parts of vllm run with Apple Metal by using the PyTorch MPS fallback mode (see
build_and_run.sh
), which ensures that PyTorch operators can run natively on MPS while other operations run on CPU. Though the framework runs end-to-end and produces texts, it's not producing a sensible result:...which needs further debugging, and comments welcomed -- I have no idea what's going on there.
For a full Metal support, we would have to implement all current CUDA kernels with Metal, which will be a lot of work. So this patch is the very first step before we have full Metal support.
In general, the patch assumes the Metal platform is a CPU-based platform (i.e., using CPU workers) with PyTorch MPS backend. This can also be improved in the future, for example, using the GPU scheduler for Metal.