Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unload the model #3281

Open
osafaimal opened this issue Mar 8, 2024 · 9 comments
Open

unload the model #3281

osafaimal opened this issue Mar 8, 2024 · 9 comments

Comments

@osafaimal
Copy link

Hi,
i m sorry, i don't find how unload model. like i load a model, i delete the object and i call the garbage collector but it does nothing.
How we are suppose to unload model?
I want to load a model do a batch, load an other do a batch, like that for multiple models for comparing them. But for now i must stop python each time.

@hmellor
Copy link
Collaborator

hmellor commented Mar 9, 2024

Try calling torch.cuda.empty_cache() after you delete the LLM object

@chenxu2048
Copy link
Contributor

You can also use gc.collect() to remove *garbage* objects immediately, after you delete them.

@osafaimal
Copy link
Author

image
both doesn't work.

@chenxu2048
Copy link
Contributor

You should also clean Notebook output: https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code

@osafaimal
Copy link
Author

osafaimal commented Mar 11, 2024

i always do (In the GUI not in my cells)

@mnoukhov
Copy link

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

@osafaimal
Copy link
Author

osafaimal commented Apr 2, 2024

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

i had already read that. My problem stay unsolved when i use the Vllm from llamaindex otherwise it almost works. I've a little of memory that stay used (~1GB) but at least i can load and unload the models. the problem is that i don't find how access to the member llm_engine of Vllm.LLM

@vvolhejn
Copy link

vvolhejn commented Oct 1, 2024

@chenxu2048 the notebook output is just computed data shown to the user, the Python kernel computes it but it's a one-way communication - the output doesn't affect the kernel at all. Therefore clearing the output will have no effect on GPU memory or any other state of the kernel.

@david-koleckar
Copy link

No resulute answer given. Can be model unload from gpu ram with vllm? Yes or no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants