MPS backend (Metal kernels) support (Apple, M1, M2) #212

dosier · 2023-06-22T20:57:19Z

Do you know if support for this is planned? I may be interested in writing the custom metal kernels.

WoosukKwon · 2023-06-24T00:35:59Z

Unfortunately, we are currently focusing on CUDA GPUs. I think we don't have enough resources & man power to develop and maintain MPS backend.

louis030195 · 2023-06-29T23:34:07Z

also interested

ssrisunt · 2023-10-03T20:57:52Z

also interested

signalprime · 2023-10-12T20:53:03Z

me three

CheshireAI · 2023-10-14T19:51:24Z

please

Maverobot · 2023-12-11T13:54:27Z

Since this issue is not really solved, can you please reopen this issue?

xunfeng1980 · 2023-12-27T02:03:41Z

+1

drpicox · 2024-01-16T22:49:49Z

#2244

BodhiHu · 2024-03-23T03:02:42Z

The mlc-llm/tvm is perhaps more suited for this. Tested on M1 with really good performance.

…oject#245) FILL IN THE PR DESCRIPTION HERE for model(like chatglm2/3-6b) whose `rotary_dim` not equal to `head_size`, current code will crash due to dim not equal. vllm-project#212 have a not robust enough fix. chatglm series could work, but chatglm2-6b result is not correct. this fix follow vllm rotary_embeding pytorch native impl. verified on chatglm2-6b and chatglm3-6b **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- <details>  <summary><b> PR Checklist (Click to Expand) </b></summary> <p>Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.</p> <h3>PR Title and Classification</h3> <p>Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:</p> <ul> <li><code>[Bugfix]</code> for bug fixes.</li> <li><code>[CI/Build]</code> for build or continuous integration improvements.</li> <li><code>[Doc]</code> for documentation fixes and improvements.</li> <li><code>[Model]</code> for adding a new model or improving an existing model. Model name should appear in the title.</li> <li><code>[Frontend]</code> For changes on the vLLM frontend (e.g., OpenAI API server, <code>LLM</code> class, etc.) </li> <li><code>[Kernel]</code> for changes affecting CUDA kernels or other compute kernels.</li> <li><code>[Core]</code> for changes in the core vLLM logic (e.g., <code>LLMEngine</code>, <code>AsyncLLMEngine</code>, <code>Scheduler</code>, etc.)</li> <li><code>[Hardware][Vendor]</code> for hardware-specific changes. Vendor name should appear in the prefix (e.g., <code>[Hardware][AMD]</code>).</li> <li><code>[Misc]</code> for PRs that do not fit the above categories. Please use this sparingly.</li> </ul> <p><strong>Note:</strong> If the PR spans more than one category, please include all relevant prefixes.</p> <h3>Code Quality</h3> <p>The PR need to meet the following code quality standards:</p> <ul> <li>We adhere to <a href="https://google.github.io/styleguide/pyguide.html">Google Python style guide</a> and <a href="https://google.github.io/styleguide/cppguide.html">Google C++ style guide</a>.</li> <li>Pass all linter checks. Please use <a href="https://github.com/vllm-project/vllm/blob/main/format.sh"><code>format.sh</code></a> to format your code.</li> <li>The code need to be well-documented to ensure future contributors can easily understand the code.</li> <li>Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.</li> <li>Please add documentation to <code>docs/source/</code> if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.</li> </ul> <h3>Notes for Large Changes</h3> <p>Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with <code>rfc-required</code> and might not go through the PR.</p> <h3>What to Expect for the Reviews</h3> <p>The goal of the vLLM team is to be a <i>transparent reviewing machine</i>. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process: </p> <ul> <li> After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.</li> <li> After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.</li> <li> After the review, the reviewer will put an <code> action-required</code> label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.</li> <li> Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion. </li> </ul> <h3>Thank You</h3> <p> Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone! </p> </details>

Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>

WoosukKwon closed this as completed Jun 24, 2023

tomeras91 mentioned this issue May 12, 2024

[CI/Build] build on empty device for better dev experience #4773

Merged

prarit pushed a commit to prarit/vllm that referenced this issue Oct 21, 2024

fix dbrx weight loader (vllm-project#212)

2d7ab9e

Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS backend (Metal kernels) support (Apple, M1, M2) #212

MPS backend (Metal kernels) support (Apple, M1, M2) #212

dosier commented Jun 22, 2023

WoosukKwon commented Jun 24, 2023

louis030195 commented Jun 29, 2023

ssrisunt commented Oct 3, 2023

signalprime commented Oct 12, 2023

CheshireAI commented Oct 14, 2023

Maverobot commented Dec 11, 2023 •

edited

Loading

xunfeng1980 commented Dec 27, 2023

drpicox commented Jan 16, 2024

BodhiHu commented Mar 23, 2024

MPS backend (Metal kernels) support (Apple, M1, M2) #212

MPS backend (Metal kernels) support (Apple, M1, M2) #212

Comments

dosier commented Jun 22, 2023

WoosukKwon commented Jun 24, 2023

louis030195 commented Jun 29, 2023

ssrisunt commented Oct 3, 2023

signalprime commented Oct 12, 2023

CheshireAI commented Oct 14, 2023

Maverobot commented Dec 11, 2023 • edited Loading

xunfeng1980 commented Dec 27, 2023

drpicox commented Jan 16, 2024

BodhiHu commented Mar 23, 2024

Maverobot commented Dec 11, 2023 •

edited

Loading