unload the model #3281

osafaimal · 2024-03-08T13:40:20Z

Hi,
i m sorry, i don't find how unload model. like i load a model, i delete the object and i call the garbage collector but it does nothing.
How we are suppose to unload model?
I want to load a model do a batch, load an other do a batch, like that for multiple models for comparing them. But for now i must stop python each time.

hmellor · 2024-03-09T11:13:14Z

Try calling torch.cuda.empty_cache() after you delete the LLM object

chenxu2048 · 2024-03-11T05:06:37Z

You can also use gc.collect() to remove *garbage* objects immediately, after you delete them.

osafaimal · 2024-03-11T09:01:32Z

both doesn't work.

chenxu2048 · 2024-03-11T13:32:21Z

You should also clean Notebook output: https://stackoverflow.com/questions/24816237/ipython-notebook-clear-cell-output-in-code

osafaimal · 2024-03-11T14:12:46Z

i always do (In the GUI not in my cells)

mnoukhov · 2024-03-28T15:23:14Z

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

osafaimal · 2024-04-02T12:20:07Z

this seems mostly solved by #1908 with

import gc

import torch
from vllm import LLM, SamplingParams
from vllm.model_executor.parallel_utils.parallel_state import destroy_model_parallel

# Load the model via vLLM
llm = LLM(model=model_name, download_dir=saver_dir, tensor_parallel_size=num_gpus, gpu_memory_utilization=0.70)

# Delete the llm object and free the memory
destroy_model_parallel()
del llm.llm_engine.driver_worker
del llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully delete the llm pipeline and free the GPU memory!")

i had already read that. My problem stay unsolved when i use the Vllm from llamaindex otherwise it almost works. I've a little of memory that stay used (~1GB) but at least i can load and unload the models. the problem is that i don't find how access to the member llm_engine of Vllm.LLM

vvolhejn · 2024-10-01T12:14:56Z

@chenxu2048 the notebook output is just computed data shown to the user, the Python kernel computes it but it's a one-way communication - the output doesn't affect the kernel at all. Therefore clearing the output will have no effect on GPU memory or any other state of the kernel.

david-koleckar · 2024-10-08T19:15:25Z

No resulute answer given. Can be model unload from gpu ram with vllm? Yes or no

lizzzcai mentioned this issue Jun 13, 2024

[Feature]: load/unload API to run multiple LLMs in a single GPU instance #5491

Open

russellb mentioned this issue Oct 16, 2024

[Usage]: Out of Memory w/ multiple models #4678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unload the model #3281

unload the model #3281

osafaimal commented Mar 8, 2024

hmellor commented Mar 9, 2024

chenxu2048 commented Mar 11, 2024

osafaimal commented Mar 11, 2024

chenxu2048 commented Mar 11, 2024

osafaimal commented Mar 11, 2024 •

edited

Loading

mnoukhov commented Mar 28, 2024

osafaimal commented Apr 2, 2024 •

edited

Loading

vvolhejn commented Oct 1, 2024

david-koleckar commented Oct 8, 2024

unload the model #3281

unload the model #3281

Comments

osafaimal commented Mar 8, 2024

hmellor commented Mar 9, 2024

chenxu2048 commented Mar 11, 2024

osafaimal commented Mar 11, 2024

chenxu2048 commented Mar 11, 2024

osafaimal commented Mar 11, 2024 • edited Loading

mnoukhov commented Mar 28, 2024

osafaimal commented Apr 2, 2024 • edited Loading

vvolhejn commented Oct 1, 2024

david-koleckar commented Oct 8, 2024

osafaimal commented Mar 11, 2024 •

edited

Loading

osafaimal commented Apr 2, 2024 •

edited

Loading