Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Out of Memory w/ multiple models #4678

Closed
yudataguy opened this issue May 8, 2024 · 4 comments
Closed

[Usage]: Out of Memory w/ multiple models #4678

yudataguy opened this issue May 8, 2024 · 4 comments
Labels
usage How to use vllm

Comments

@yudataguy
Copy link

Your current environment

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 

How would you like to use vllm

I'm running a eval framework that's evaluating multiple models. vllm doesn't seem to free the gpu memory after initialize the 2nd model (with the same variable name), how to free up gpu memory with each vLLMEngine call llm = LLM(new_model)

@yudataguy yudataguy added the usage How to use vllm label May 8, 2024
@yudataguy
Copy link
Author

Tried methods from #1908
no success

@russellb
Copy link
Collaborator

the LLM engine internal to the LLM class should get destroyed when your LLM instance is garbage collected. You could try forcing that with del(llm).

@russellb
Copy link
Collaborator

more detailed input on this in #3281

@russellb
Copy link
Collaborator

going to close this since it's a duplicate of #3281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants