求问34b怎么使用vllm加速或者批量推理 #452
Unanswered
lalalabobobo
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
使用vllm,总是报CUDA out of memory,但是A800显存是够用的。
Beta Was this translation helpful? Give feedback.
All reactions