Fix a bug in 1D input shape #5

WoosukKwon · 2023-03-03T04:18:38Z

This PR fixes a miscalculation of the input shape when iteration-level scheduling is used.

More improvements awq

* finish changing scheduler * finish merge * fix model * Fix (vllm-project#5) * fix problems * fix * delete unused params * remove redundant comments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>

Align optimum-intel based model signature with vLLM signature

…imum Install optimum-intel from latest main

* Drop indecies when finish * min 1 attention layer * CG is working on forward pass passing * Remove comments * cosmetics - rename indecies -> indices, organize some whitespaces * Add some TODOs * Adding mamba cache for cg * Remove useless vars from input_metadata * Remove unused import * Set the seqlen offset to boolean * Return only hidden state * Return only hidden states * Add padding to match forward pass bs * Is prompt instead of seqlen offset * Remove mamba cache class (not used) * Another remove * Remove * Use mamba4gc * Fix mamba forward, run update only on non prompt * Use 1 index after the maximal index * Remove import * Remove import * typo * typo * place holder * Padding and empty token takes it from the first empty place * reformat * Apply suggestions from code review Whitespaces --------- Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: Tomer Asida <tomera@ai21.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>

…3small [Model][Kernels] Support Phi3small architecture, blocksparse attnention prefilling kernel, CUDA+Triton paged attn kernels

Faster v2 hopper fused moe kernel configs

Kuntai disagg refactor

WoosukKwon added 4 commits March 3, 2023 04:16

Fix a bug in 1D shape

e5a1fa8

Minor

342275f

Minor

b91a2fa

Test iteration-level scheduling

4db2916

WoosukKwon merged commit 04e5acc into main Mar 6, 2023

WoosukKwon deleted the bugfix branch March 6, 2023 18:05

TheBloke mentioned this pull request Jul 20, 2023

Can't launch OpenAI API server on newly installed vLLM in Docker - fastchat not found #537

Closed

v1nc3nt27 pushed a commit to v1nc3nt27/vllm that referenced this pull request Sep 12, 2023

Merge pull request vllm-project#5 from ri938/more_improvements_awq

73db30f

More improvements awq

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Fix a bug in 1D input shape (vllm-project#5)

c639d4c

luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Mar 14, 2024

Merge pull request vllm-project#5 from slyalin/fixed_parameter_types

30605c8

Align optimum-intel based model signature with vLLM signature

luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Mar 25, 2024

Merge pull request vllm-project#5 from ilya-lavrenov/update-intel-opt…

8f016e0

…imum Install optimum-intel from latest main

dlopes78 mentioned this pull request May 8, 2024

[Bug]: VLLM + tritonserver #4695

Open

Starmys pushed a commit to Starmys/vllm that referenced this pull request May 20, 2024

Merge pull request vllm-project#5 from wenxcs/fit-cluster-tests

de23377

Faster v2 hopper fused moe kernel configs

yuhuixu1993 mentioned this pull request Jun 2, 2024

[Bug]: loading squeezellm model #5190

Open

oliver-li mentioned this pull request Jul 5, 2024

[Bug]: NCCL hangs and causes timeout #5484

Open

This was referenced Jul 5, 2024

Support W4A8 quantization for vllm #5218

Merged

[Bug]: call for stack trace for "Watchdog caught collective operation timeout" #6042

Open

ehuaa mentioned this pull request Jul 19, 2024

[Bug]: The vllm is disconnected after running for some time #5084

Closed

xinzaifeixiang1992 mentioned this pull request Jul 24, 2024

[Bug]: vllm-0.5.3.post1部署Qwen2-72b-instruct-awq模型，刚开始服务正常，但是并发高的时候就报错 #6734

Open

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

Minami-su mentioned this pull request Aug 11, 2024

[Bug]: vllm is crashed on v0.5.3.post1 #7161

Closed

wangwensuo mentioned this pull request Aug 22, 2024

[Bug]: llama3-405b-fp8 NCCL communication #7775

Open

zeroorhero pushed a commit to zeroorhero/vllm that referenced this pull request Sep 23, 2024

Merge pull request vllm-project#5 from KuntaiDu/kuntai-disagg-refactor

4db6446

Kuntai disagg refactor

liulisi16323 mentioned this pull request Sep 24, 2024

[Bug]: v0.5.5 crash: "AssertionError: expected running sequences" #8016

Open

1 task

Clint-chan mentioned this pull request Sep 29, 2024

[Bug]: Vllm0.6.2 UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown #8933

Open

1 task

SpaceHunterInf mentioned this pull request Sep 30, 2024

[Bug]: Bus error (core dumped) #8974

Closed

1 task

This was referenced Oct 12, 2024

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered #6976

Closed

[Bug]: Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered #9306

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug in 1D input shape #5

Fix a bug in 1D input shape #5

WoosukKwon commented Mar 3, 2023

Fix a bug in 1D input shape #5

Fix a bug in 1D input shape #5

Conversation

WoosukKwon commented Mar 3, 2023