Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] baichuan-13b-chat Service exception after long run #677

Closed
Tomorrowxxy opened this issue Aug 5, 2023 · 5 comments
Closed

[Bug] baichuan-13b-chat Service exception after long run #677

Tomorrowxxy opened this issue Aug 5, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@Tomorrowxxy
Copy link

Start command

python -m vllm.entrypoints.openai.api_server --model baichuan-inc/Baichuan-13B-Chat --host 0.0.0.0 --port 8777 --trust-remote-code --dtype half

After about 12 hours of operation, the inference service stopped working

GPU:V100
CUDA:11.4

Screenshot of the problem:
Xnip2023-08-05_12-03-19

@zhuohan123 zhuohan123 added the bug Something isn't working label Aug 8, 2023
@zhuohan123
Copy link
Member

Can you describe in more detail what exactly happened? For example, does all future requests fail, or just one specific request fail?

From the screenshot it feels like it's maybe because the client disconnects and thus the server stops the running request.

@Tomorrowxxy
Copy link
Author

Tomorrowxxy commented Aug 8, 2023

@zhuohan123
image

  • All future requests will fail, there is no Avg prompt throughput: xxxx reasoning and other related information after Received request, directly Aborted request
  • That is, the current machine is not processing any inference requests and must manually kill the process and restart the service
  • This happened after running for about 6 hours
    image
    image

@Tomorrowxxy
Copy link
Author

image

  • All future requests will fail, there is no Avg prompt throughput: xxxx reasoning and other related information after Received request, directly Aborted request
  • That is, the current machine is not processing any inference requests and must manually kill the process and restart the service
  • This happened after running for about 6 hours
    image
    image

I think it should be caused by insufficient CUDA memory. As shown in the picture, the occupancy has reached 95%, resulting in no more reasoning
I tried to start baichuan-13b with gpu_memory_utilization = 0.8 on v100, sorry for the failure to start, it must be 0.9.

@Tomorrowxxy
Copy link
Author

WX20230818-153722

After running for a while, no more inference. vllm's service is still there
Often appear, how should it be solved? @zhuohan123

@Tomorrowxxy Tomorrowxxy mentioned this issue Aug 18, 2023
@xiaocode337317439
Copy link

+1

@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants