[torch.compile] limit inductor threads and lazy import quant #10482

youkaichao · 2024-11-20T07:21:16Z

Signed-off-by: youkaichao <youkaichao@gmail.com>

github-actions · 2024-11-20T07:21:30Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao · 2024-11-20T08:24:03Z

the root cause is, quantization configs are imported too early. and when quantization configs are imported, ParallelLMHead is imported, triggering @torch.compile for get_masked_input_and_mask .

every process calling torch.compile will spawn many processes (cpu core number of processes), and having so many processes will cause vscode to hang.

DarkLight1337

Thanks for fixing, can you add a test to ensure that each config in the list can be successfully resolved? (In case someone forgets to update the list after adding a config)

vllm/plugins/__init__.py

Signed-off-by: youkaichao <youkaichao@gmail.com>

vllm/config.py

Signed-off-by: youkaichao <youkaichao@gmail.com>

DarkLight1337

Thanks for fixing!

…oject#10482) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

…oject#10482) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

…oject#10482) Signed-off-by: youkaichao <youkaichao@gmail.com>

fix compile

661072c

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested review from zhuohan123, alexm-neuralmagic, comaniac and njhill as code owners November 20, 2024 07:21

youkaichao mentioned this pull request Nov 20, 2024

[Usage]: VSCode debugger is hanging #10480

Closed

1 task

youkaichao added 3 commits November 19, 2024 23:25

fix

d46c60d

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix

3449e0a

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix lazy import quantization

cba7b7a

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested review from mgoin and robertgshaw2-neuralmagic as code owners November 20, 2024 07:54

youkaichao changed the title ~~[torch.compile] limit threads for compilation~~ [torch.compile] limit inductor threads and lazy import quant Nov 20, 2024

youkaichao requested a review from DarkLight1337 November 20, 2024 08:28

DarkLight1337 reviewed Nov 20, 2024

View reviewed changes

jeejeelee reviewed Nov 20, 2024

View reviewed changes

vllm/plugins/__init__.py Show resolved Hide resolved

youkaichao added 4 commits November 20, 2024 10:55

add tests

cfbc70a

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix

1535490

Signed-off-by: youkaichao <youkaichao@gmail.com>

polish message

405fada

Signed-off-by: youkaichao <youkaichao@gmail.com>

add tests

fda0e16

Signed-off-by: youkaichao <youkaichao@gmail.com>

mergify bot added the ci/build label Nov 20, 2024

youkaichao added 3 commits November 20, 2024 11:11

fix ops

585dff9

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix import

83d6c1f

Signed-off-by: youkaichao <youkaichao@gmail.com>

Merge branch 'main' into fix_compile

2cbfb83

youkaichao commented Nov 20, 2024

View reviewed changes

vllm/config.py Show resolved Hide resolved

fix command

4228ac3

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao enabled auto-merge (squash) November 20, 2024 21:10

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2024

fix

5fd5164

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix

e991365

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested a review from DarkLight1337 November 20, 2024 23:48

DarkLight1337 approved these changes Nov 21, 2024

View reviewed changes

youkaichao disabled auto-merge November 21, 2024 02:36

youkaichao merged commit 388ee3d into vllm-project:main Nov 21, 2024
69 of 71 checks passed

youkaichao deleted the fix_compile branch November 21, 2024 02:36

jeejeelee mentioned this pull request Nov 25, 2024

[Usage]: torch.compile still generates multiple subprocesses #10619

Closed

1 task

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[torch.compile] limit inductor threads and lazy import quant (vllm-pr…

659bd7d

…oject#10482) Signed-off-by: youkaichao <youkaichao@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] limit inductor threads and lazy import quant #10482

[torch.compile] limit inductor threads and lazy import quant #10482

youkaichao commented Nov 20, 2024

github-actions bot commented Nov 20, 2024

youkaichao commented Nov 20, 2024 •

edited

Loading

DarkLight1337 left a comment

DarkLight1337 left a comment

[torch.compile] limit inductor threads and lazy import quant #10482

[torch.compile] limit inductor threads and lazy import quant #10482

Conversation

youkaichao commented Nov 20, 2024

github-actions bot commented Nov 20, 2024

youkaichao commented Nov 20, 2024 • edited Loading

DarkLight1337 left a comment

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

youkaichao commented Nov 20, 2024 •

edited

Loading