-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.12] getattr_static can result in reference leaks #118013
Comments
I'm not sure I see a reference leak here, but certainly a longer lifetime than before. The object will still be deleted eventually. I do see how this behavior might be unexpected or undesirable, however, particularly for large objects. Pinging @AlexWaygood, as the author of #104267. |
Thanks for the ping!
Agreed on all counts. I deliberately made sure to use a bounded cache in #104267, so that the size of the cache (and with it, memory consumption) wouldn't grow indefinitely. And even with a bounded cache, the only reason why I thought this would probably be safe was that classes are generally defined once in the global scope, and persist for the entire program, meaning that a strong reference to a class in a cache won't generally extend the lifetime of that class longer than it would have been otherwise. What problem is this causing for your program exactly, @williamwen42? Is it causing a significant increase in memory consumption? Or is it just that a test somewhere is reporting that a class has more references to it at the end of a program than at the start of the program? I'd prefer to keep the caching if possible, as it significantly speeds up |
Thanks for the quick response! The longer lifetime is the "reference leak" issue that I'm referring to. In my case, the leaked variable may be holding on to additional resources, e.g. GPU memory. So we could imagine a case where a user might do something like: my_model1 = torch.compile(MyModel())
my_model1(input1)
del my_model1
# e.g. 2GB of GPU memory could still be allocated
my_model2 = torch.compile(MyModel2())
# now my_model2 cannot run due to CUDA OOM
# unclear to user how to release the remaining GPU memory
my_model2(input2) I do see the point where classes defined in a function are not common. And I don't see this issue happening if the leaking variable and class definition are instead in the global scope, although it does occur if the variable is a class member. But I think it's still bad user experience for resources to be mysteriously held on (under an unclear set of circumstances), even though the user intends to release it. In our case, we don't anticipate calling Some ideas that were floating in my head:
Global class member example for reference: import gc
from inspect import getattr_static
import sys
import weakref
class C1:
pass
obj = C1()
weakref.finalize(obj, lambda: print("obj deleted!"))
class C2:
member = obj
def __init__(self):
pass
c2 = C2()
getattr_static(c2, "member", None)
print("done main!")
del obj
del c2
del C2
gc.collect()
print("done!") Output:
|
Just so I'm clear here: in this example, Is that correct? It surprises me that retaining a strong reference to a class could lead to a 2GB memory leak. But if that's the case, then I'm definitely happy to look at alternatives to the current caching strategy (the weakref option is probably best here, I think) |
Actually Technically, The 2GB+ leak from a single variable can happen since the single reference could be referring to a large tensor/nd-array or even an entire model. I commented out the |
But how can it? The >>> import inspect, weakref
>>> class Foo: ...
...
>>> f = Foo()
>>> weakref_to_class = weakref.ref(Foo)
>>> weakref_to_instance = weakref.ref(f)
>>> inspect.getattr_static(f, 'foo', 'bar')
'bar'
>>> del f
>>> weakref_to_instance()
>>> del Foo
>>> weakref_to_class()
<class '__main__.Foo'> (Well, strictly speaking it stores a reference to the mro tuple of the class of the object that |
Ah, I'm not being precise enough here. Yes, calling |
Okay, got it — thanks! I'll work on a fix (unless someone else beats me to it :-) |
I'm assigning this to myself so I don't forget to look at it, but that doesn't mean somebody else shouldn't take a look if they fancy it |
Hmm, well, I experimented with using weakrefs for the cache, i.e. a patch like this: --- a/Lib/inspect.py
+++ b/Lib/inspect.py
@@ -157,6 +157,7 @@
import types
import functools
import builtins
+import weakref
from keyword import iskeyword
from operator import attrgetter
from collections import namedtuple, OrderedDict
@@ -1815,6 +1816,7 @@ def trace(context=1):
_sentinel = object()
_static_getmro = type.__dict__['__mro__'].__get__
_get_dunder_dict_of_class = type.__dict__["__dict__"].__get__
+_SHADOWED_DICT_CACHE = {}
def _check_instance(obj, attr):
@@ -1832,7 +1834,7 @@ def _check_class(klass, attr):
return entry.__dict__[attr]
return _sentinel
-@functools.lru_cache()
+
def _shadowed_dict_from_mro_tuple(mro):
for entry in mro:
dunder_dict = _get_dunder_dict_of_class(entry)
@@ -1844,8 +1846,21 @@ def _shadowed_dict_from_mro_tuple(mro):
return class_dict
return _sentinel
+
def _shadowed_dict(klass):
- return _shadowed_dict_from_mro_tuple(_static_getmro(klass))
+ mro = _static_getmro(klass)
+
+ try:
+ weakref_mro = tuple(map(weakref.ref, mro))
+ except TypeError:
+ # is it possible to have a class that can't be weakref'd?
+ # not sure -- but better to be safe than sorry
+ return _shadowed_dict_from_mro_tuple(mro)
+
+ if weakref_mro not in _SHADOWED_DICT_CACHE:
+ _SHADOWED_DICT_CACHE[weakref_mro] = _shadowed_dict_from_mro_tuple(mro)
+ return _SHADOWED_DICT_CACHE[weakref_mro]
+
def getattr_static(obj, attr, default=_sentinel):
"""Retrieve attributes without triggering dynamic lookup via the
diff --git a/Lib/test/libregrtest/utils.py b/Lib/test/libregrtest/utils.py
index 791f996127..e0bcd18100 100644
--- a/Lib/test/libregrtest/utils.py
+++ b/Lib/test/libregrtest/utils.py
@@ -275,7 +275,6 @@ def clear_caches():
except KeyError:
pass
else:
- inspect._shadowed_dict_from_mro_tuple.cache_clear()
inspect._filesbymodname.clear()
inspect.modulesbyfile.clear() Unfortunately, it seems like the cost of constructing the weakrefs on every call wipes out any performance gain from having the cache (the function actually becomes slower than it would be if we simply reverted 1b19bd1). @carljm, I don't suppose you see any way of avoiding doing a plain revert, do you? It leads to a pretty big slowdown :( |
A few ideas, but all have major problems:
Listing these in case any of them prompt any other brilliant ideas, but I don't think any of these as described are a workable option. |
After a bit of experimentation, I've managed to come up with #118202, which is still a fair bit slower than |
…_dict` (pythonGH-118202) (cherry picked from commit 8227883) Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Thanks @williamwen42 for reporting the issue and for patiently explaining it to me! This should be fixed on |
…_static" For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]
…to buggy getattr_static" For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]
…_static" For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]
…to buggy getattr_static" For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]
…_static" For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]
…25062) For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. Pull Request resolved: #125062 Approved by: https://github.com/anijain2305, https://github.com/jansel
Summary: For tracking pytorch/pytorch#124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. X-link: pytorch/pytorch#125062 Approved by: https://github.com/anijain2305, https://github.com/jansel Reviewed By: kit1980 Differential Revision: D56801860 Pulled By: williamwen42 fbshipit-source-id: 16b38ce7f1b8cabe2fcdd660fc8a2fabdda04aca
…25062) For tracking #124302 so that we can re-enable the test once 3.12 updates with the bug fix for python/cpython#118013. Pull Request resolved: #125062 Approved by: https://github.com/anijain2305, https://github.com/jansel
Bug report
Bug description:
I discovered this bug while working on pytorch/pytorch#124302.
It looks like calling
getattr_static
is causing an object to be reference leaked:Output:
If I comment out the
getattr_static
line, the output is as expected:It looks like this PR #104267 indirectly cached calls to
getattr_static
, which is resulting in reference leaks. Perhaps this cache needs to use weak references?Original PyTorch code for reference (
torch.compile
callsgetattr_static
onmod
at some point):CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
inspect._shadowed_dict
#118202inspect._shadowed_dict
(GH-118202) #118232The text was updated successfully, but these errors were encountered: