You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've recently tracked a CPU (possibly GPU also) memory leak to list[Tensor] states. For example:
import torch
from torchmetrics import Metric
class DummyListMetric(Metric):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.add_state("x", default=[])
def update(self, x=None):
x = torch.tensor(1) if x is None else x
self.x.append(x)
When (the parent) Metric.reset() is called, self.x is simply overwritten with an empty list [1]. Unfortunately, this doesn't guarantee that the contents of self.x are deleted, meaning Tensor elements are not always correctly freed
After some investigation, I found my custom metrics (subclasses ofMetric) didn't cause my (work) system to run out of memory if I added the following overload:
def reset(self):
for attr, default in self._defaults.items():
if isinstance(default, list):
getattr(self, attr).clear()
return super().reset()
The same fix can be applied directly inside torchmetrics with a single line change, modifying [2] to:
getattr(self, attr).clear()
Looking at other open issues, this issue might be related to #2481 (which also references list states). I'd be very happy to open a PR if the above sounds reasonable. Many thanks!
🐛 Bug
Hello!
I've recently tracked a CPU (possibly GPU also) memory leak to
list[Tensor]
states. For example:When (the parent)
Metric.reset()
is called,self.x
is simply overwritten with an empty list [1]. Unfortunately, this doesn't guarantee that the contents ofself.x
are deleted, meaningTensor
elements are not always correctly freedAfter some investigation, I found my custom metrics (subclasses of
Metric
) didn't cause my (work) system to run out of memory if I added the following overload:The same fix can be applied directly inside
torchmetrics
with a single line change, modifying [2] to:Looking at other open issues, this issue might be related to #2481 (which also references
list
states). I'd be very happy to open a PR if the above sounds reasonable. Many thanks!Environment
The text was updated successfully, but these errors were encountered: