Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Memory Leak Error #284

Closed
pandeydeep9 opened this issue Nov 17, 2021 · 17 comments · Fixed by #307
Closed

Potential Memory Leak Error #284

pandeydeep9 opened this issue Nov 17, 2021 · 17 comments · Fixed by #307

Comments

@pandeydeep9
Copy link

I installed learn2learn using "pip install learn2learn". When I try to run maml_miniimagenet.py (from learn2learn/examples/vision/maml_miniimagenet.py ) with a batch size of 2 and shot = 1, I get the same error after 63 iterations. When I change to shot = 5, I get the error after 3 iterations.

Iteration 63
Meta Train Error 2.0417345762252808
Meta Train Accuracy 0.20000000298023224
Meta Valid Error 1.8002310991287231
Meta Valid Accuracy 0.20000000298023224
Traceback (most recent call last):
File "/home/deep/Desktop/IMPLEMENTATION/MyTry/MetaSGD/mini_Temp_Test.py", line 156, in
main()
File "/home/deep/Desktop/IMPLEMENTATION/MyTry/MetaSGD/mini_Temp_Test.py", line 106, in main
evaluation_error.backward()
File "/home/deep/.local/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/deep/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.79 GiB total capacity; 3.60 GiB already allocated; 77.56 MiB free; 3.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

When I look at nvidia-smi, the memory usage gradually increases with each iteration.
However, If I comment out the meta-validation loss part, (line 114-112 in this script) then I don't get the memory leak problem. I think the issue is similar to (Potential Memory Leak #278 ) I wonder why this issue is and how the issue can be solved?

@Phoveran
Copy link

Actually I was just occupied by another project, so I had not solved this but closed the issue.
Maybe the problem does exist.

@seba-1511
Copy link
Member

Thanks for raising the issue @pandeydeep9 and @Phoveran,

These leaks are worrisome. Could you share more about your setup? Which GPU, CPU, and versions of Python, PyTorch, and learn2learn? It seems to be hardward-dependent since @nightlessbaron wasn’t able to reproduce the bug on Colab. Also, are you running the mini-imagenet script as-is?

@pandeydeep9
Copy link
Author

pandeydeep9 commented Nov 21, 2021

I reduced the meta_batch_size parameter to 2 and shots to 1. That is the only change I made in the example mini-imagenet script.
My CPU is Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz. My GPU is TU106M [GeForce RTX 2060 Mobile] , version: a1, clock 33MHz.
I tried on Python 3.8.5, learn2learn version 0.1.5, and torch version is '1.10.0+cu102'

@tranquangchung
Copy link

I have the same issue, even I run "maml_miniimagenet.py" on A5000 with 24GB
After a few iterations, It gives the following error message "CUDA out of memory"
I try many version 0.1.3, 0.1.4, 0.1.5, 0.1.6
So I think that if I use this library for my project, I will face much trouble in the future.
Can you give me some advice or how to fix it?
Many Thanks

@seba-1511
Copy link
Member

Thanks for the additional feedback. Are you also using PyTorch v.1.10? And does commenting out the validation step also fix the memory leak?

@tranquangchung
Copy link

Yes, I use Pytorch 1.10.0, CUDA Version: 11.2, python 3.8.12,

@Phoveran
Copy link

My setting:

python: 3.9.7
learn2learn: 0.1.6 (using pip install learn2learn)
PyTorch: 1.10
CUDA: 11.4
GPU: RTX 2080Ti

@seba-1511
Copy link
Member

Thanks for the extra info, I wonder if the issue cropped up with PyTorch 1.10 on CUDA 11+. As a temporary fix, does changing learner = maml.clone() to learner = maml.clone(first_order=True) on l. 112 solve the leak for you?

@pandeydeep9
Copy link
Author

Yes, adding first_order=True on l. 112 solves the leak problem. Also, I guess this should give the expected results as I believe we can use the first order MAML during the validation/test phases and get the same results (i.e. do not need to track gradients for MAML during test/validation phases).

Thanks

@ligeng0197
Copy link

We meet the same case and our setting is pytorch 1.10.0, python 3.9.5, cuda 11.5, tesla m40(24G). We are glad to see this issue published since we debug our code repeatedly and have no idea what's causing the increasing cuda memory occupation over val or test iterations.

@seba-1511 seba-1511 reopened this Dec 2, 2021
@tobiasvanderwerff
Copy link

I was facing the same issue, but managed to solve it by downgrading Pytorch from version 1.10 to 1.9. I was using the following setup:

learn2learn 0.1.6
Python 3.8.6
Pytorch 1.10
GPU: Nvidia V100 (32gb)
Cuda 10.2

Using this setup, memory usage kept increasing over epochs until an out-of-memory error occurred. However, when using Pytorch 1.9, memory usage stabilizes.

@sjtugzx
Copy link

sjtugzx commented Dec 29, 2021

Thanks for the extra info, I wonder if the issue cropped up with PyTorch 1.10 on CUDA 11+. As a temporary fix, does changing learner = maml.clone() to learner = maml.clone(first_order=True) on l. 112 solve the leak for you?

Honestly, I tried to use maml for finetuning T5 transformer, befor adding "first_order=True", I just could run 2 tps, however, this way couldn't fix my problem. After adding this parameter, I could run 4 tps, but still got memory leak. I gues there are still some problems and exposed by huge networks such as transformer.

learn2learn 0.1.6
Python 3.9
Pytorch 1.10
GPU: 3080 (24GB)
Cuda 10.2

@seba-1511
Copy link
Member

seba-1511 commented Dec 29, 2021

The memory leak seems to have been introduced in PyTorch 1.10. @sjtugzx do you also see leaks with T5 on PyTorch 1.9?

I haven't had time to investigate it yet, so help is welcome.

@kzhang2
Copy link
Contributor

kzhang2 commented Jan 12, 2022

I have a suggestion for a potential fix. It is a little bit hacky though. In my observations, the key problem leading to the memory leak seems to be that the compute graph for the gradient update is being created, even when first_order=True. During training, I think the memory doesn't accumulate because the compute graph gets flushed when you do loss.backward(). However, at evaluation time, you never need to call loss.backward(), so there's a possibility the memory usage scales wildly.

In my code, what I've done to get rid of this extra unneeded memory usage at evaluation time is to add a eval flag to the adapt function inside MAML and MetaSGD which causes the gradient update to be wrapped in a no_grad context, so

# Update the module
self.module = maml_update(self.module, self.lr, gradients)

becomes

# Update the module
if eval:
    with torch.no_grad():
        self.module = maml_update(self.module, self.lr, gradients)
    for p in self.module.parameters():
        p.requires_grad = True
else:
    self.module = maml_update(self.module, self.lr, gradients)

I haven't investigated this in detail so I'm not sure if this is the best way to proceed, but let me know if this seems promising and if I should investigate further, and maybe even make a pull request.

@seba-1511
Copy link
Member

For people following, @kzhang2 and I have been discussing on slack and we came up with a fix. Expect a PR + release in the next 2 weeks. Meanwhile, the fix is to update the update_module function in learn2learn/utils/__init__.py as follows:

def update_module(module, updates=None, memo=None):
    r"""
    [[Source]](https://github.com/learnables/learn2learn/blob/master/learn2learn/utils.py)

    **Description**

    Updates the parameters of a module in-place, in a way that preserves differentiability.

    The parameters of the module are swapped with their update values, according to:
    \[
    p \gets p + u,
    \]
    where \(p\) is the parameter, and \(u\) is its corresponding update.


    **Arguments**

    * **module** (Module) - The module to update.
    * **updates** (list, *optional*, default=None) - A list of gradients for each parameter
        of the model. If None, will use the tensors in .update attributes.

    **Example**
    ~~~python
    error = loss(model(X), y)
    grads = torch.autograd.grad(
        error,
        model.parameters(),
        create_graph=True,
    )
    updates = [-lr * g for g in grads]
    l2l.update_module(model, updates=updates)
    ~~~
    """
    if memo is None:
        memo = {}
    if updates is not None:
        params = list(module.parameters())
        if not len(updates) == len(list(params)):
            msg = 'WARNING:update_module(): Parameters and updates have different length. ('
            msg += str(len(params)) + ' vs ' + str(len(updates)) + ')'
            print(msg)
        for p, g in zip(params, updates):
            p.update = g

    # Update the params
    for param_key in module._parameters:
        p = module._parameters[param_key]
        if p is not None and hasattr(p, 'update') and p.update is not None:
            if p in memo:
                module._parameters[param_key] = memo[p]
            else:
                updated = p + p.update
                p.update = None
                memo[p] = updated
                module._parameters[param_key] = updated

    # Second, handle the buffers if necessary
    for buffer_key in module._buffers:
        buff = module._buffers[buffer_key]
        if buff is not None and hasattr(buff, 'update') and buff.update is not None:
            if buff in memo:
                module._buffers[buffer_key] = memo[buff]
            else:
                updated = buff + buff.update
                buff.update = None
                memo[buff] = updated
                module._buffers[buffer_key] = updated

    # Then, recurse for each submodule
    for module_key in module._modules:
        module._modules[module_key] = update_module(
            module._modules[module_key],
            updates=None,
            memo=memo,
        )

    # Finally, rebuild the flattened parameters for RNNs
    # See this issue for more details:
    # https://github.com/learnables/learn2learn/issues/139
    if hasattr(module, 'flatten_parameters'):
        module._apply(lambda x: x)
    return module

@kzhang2 kzhang2 mentioned this issue Feb 10, 2022
4 tasks
@seba-1511
Copy link
Member

Quick update: this is fixed, tested, and available in the new v0.1.7 release.

@aritroCoder
Copy link

Hi, I am using learn2learn and getting memory leak error. This is the code I am using:

#Load model weights
model.load_state_dict(torch.load('mnist_model_weights_450.pth', map_location={'cuda:2' : 'cuda:0'}))

# run the test data
meta_test_loss = 0.0
for idx, (context_x, context_y, target_x, target_y) in enumerate(test_loader):
    context_x, context_y, target_x, target_y = context_x.to(device), context_y.to(device), target_x.to(device), target_y.to(device)
    effective_batch_size = context_x.size(0)
    for i in range(effective_batch_size):
        learner = maml.clone(first_order=True)
        x_support, y_support = context_x[i], context_y[i]
        x_query, y_query = target_x[i], target_y[i]
        y_support = y_support.view(-1)
        y_query = y_query.view(-1)
        for _ in range(num_epochs):
            wts, predictions = learner(x_support)
            loss = custom_loss_function(predictions, y_support, wts)
            learner.adapt(loss)
        wts, predictions = learner(x_query)
        loss = custom_loss_function(predictions, y_query, wts)
        meta_test_loss += loss
    meta_test_loss /= effective_batch_size
    if idx % 10 == 0:
        print(f"Iteration: {idx+1}, Meta test loss: {meta_test_loss}")
    
print(f"Final Meta test loss: {meta_test_loss}")

I am getting this error:


OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU 0 has a total capacity of 14.75 GiB of which 3.06 MiB is free. Process 54265 has 14.74 GiB memory in use. Of the allocated memory 14.51 GiB is allocated by PyTorch, and 102.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  

learn2learn 0.2.0
Python 3.9
Pytorch 2.4.0+cu121 (using google colab)
GPU: T4 (15GB)
Cuda 12.2

Can anyone tell me how to fix it? I wrote the training loop similiarly but it runs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants