Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weakref finalizers prohibit GC #7639

Open
fjetter opened this issue Mar 10, 2023 · 8 comments
Open

Weakref finalizers prohibit GC #7639

fjetter opened this issue Mar 10, 2023 · 8 comments

Comments

@fjetter
Copy link
Member

fjetter commented Mar 10, 2023

In a couple of places in our code base we're using weakref.finalizers in a way that ensures that an object is actually never GCed.

The notes specifically state

Note It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.

see here https://docs.python.org/3/library/weakref.html#weakref.finalize

See a dicussion here #7593 (comment)

Almost every usage of weakref.finalize in our codebase is subject to this fallacy. Here a couple of example but this is non exhaustive.

We should remove these broken finalizers since they obviously do not help a lot other than keeping objects alive. Some might have important functionality which is no longer triggered because of the reference.

@fjetter
Copy link
Member Author

fjetter commented Mar 10, 2023

I believe some of the above examples are actually valid, e.g. when passing self._watch_q but other like _get_finalizer are very sneaky because the reference comes from the default argument to the stream argument.

@jacobtomlinson
Copy link
Member

In particular, func should not be a bound method of obj.

Would wrapping the func in a lambda avoid this reference?

@fjetter
Copy link
Member Author

fjetter commented Mar 10, 2023

Would wrapping the func in a lambda avoid this reference?

No but it's still easy to avoid by using a global function instead of a method. For example

# Instead of

class C:
    def __init__(self):
        weakref.finalize(self, self.shutdown, foo='bar')
    def shutdown(self, foo):
        ...
# ---
# do
def shutdown_C(obj, foo):
    ...

class C:
    def __init__(self):
        weakref.finalize(self, shutdown_C, foo='bar')

@wence-
Copy link
Contributor

wence- commented Mar 23, 2023

def shutdown_C(obj, foo):
    ...

class C:
    def __init__(self):
        weakref.finalize(self, shutdown_C, foo='bar')

Just to note, this patten unfortunately doesn't work if shutdown_C needs to refer to self. In finalization, the finalizer is called with args and kwargs passed to the finalize call, but not the object the finalizer is attached to:

n [8]: def shutdown_C(obj, foo):
   ...:     ...
   ...: 
   ...: class C:
   ...:     def __init__(self):
   ...:         weakref.finalize(self, shutdown_C, foo='bar')
   ...: 

In [9]: x = C()

In [10]: del x
Exception ignored in: <finalize object at 0x7fb923623f60; dead>
Traceback (most recent call last):
  File ".../weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
TypeError: shutdown_C() missing 1 required positional argument: 'obj'

@quasiben
Copy link
Member

@pentschev and I were diagnosing a recent failure in dask-cuda while using UCX . After bisecting we found the issue was tied to a recent PR removing the weakref for Offload Executor -- it seems like we need an explicit finalizer otherwise we run into shutdown issues at least when also trying to use UCX

@pentschev
Copy link
Member

We believe the issue we're experiencing in rapidsai/dask-cuda#1148 stems from Distributed failing to shutdown resources on the expected order now. For example, UCX-Py requires to be explicitly terminated (via ucp.reset() or properly destroying references to all its objects) to destroy all its async tasks, if Distributed closes the event loop before UCX-Py can stop, there’s no way UCX-Py can terminate because it holds a reference to that asynchronous task that cannot be cancelled and destroyed. Because UCX-Py doesn’t own the event loop, it seems there is no way we can actually destroy those tasks correctly, so we must rely on the caller who owns the event loop to destroy things in order.

We are not 100% certain what is happening in the issue is exactly what is described above, but it is one of the shortcomings that exist with the way Python/async allows us to work with. Therefore, we must always rely on Distributed finalizing resources appropriately, and it seems that not shutting down the offload executor with the weakref finalizer prevents Distributed from fulfilling what we need.

To quote similar developments from the past, we had to replace exit handlers by weakref finalizers in #4184 , and it seems #7644 did the exact opposite (although without the use of exit handlers).

@fjetter
Copy link
Member Author

fjetter commented Mar 30, 2023

After bisecting we found the issue was tied to a recent PR removing the weakref for #7644 --

Everything in the docs for weakref.finalize suggests that this is never executed. I applied a couple of code modifications and can confirm that this is not true. The execution order is the following

The CPython concurrent.futures._python_exit is called before the finalizer is called and it is effectively closing all threadpools. All threads are already joined, queues are empty, etc. The subsequent shutdown calls are not doing anything any longer. So even if the finalizer is executed, it is not actually doing anything.

It would be very helpful to have a proper bug report. What is in rapidsai/dask-cuda#1148 is not sufficient for me to understand what the problem is.

The CPython documentation strongly suggests that weakref.finalize will not be called under these circumstances so I do not believe that this behavior can be relied upon. It looks like we're lucky this is called and I wouldn't be surprised for this behavior to change in a future python version (or in a patch even). Besides this I can't put together what this finalizer is even supposed to be doing and am concerned that what you're seeing is a red herring. A proper reproducer or an explanation would go a long way.


To quote similar developments from the past, we had to replace exit handlers by weakref finalizers in #4184 ,

I do not believe this issue applies to the current case for a variety of reasons

  1. The finalizer used in Replace AsyncProcess exit handler by weakref.finalize #4184 is doing everything by the books. It is finalizing an instance of AsyncProcess and the finalizer is passed a weakref to a multiprocess.Process to ensure the latter is properly shut down. Therefore, neither func nor args or kwargs of the finalizer are owning any references to the finalized object, see (Notes box) https://docs.python.org/3/library/weakref.html#weakref.finalize.atexit This is not true for the threadpool executor

It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.

the AsyncProcess instance is also not a global (https://docs.python.org/3/library/weakref.html#weakref.finalize)

A finalizer will never invoke its callback during the later part of the interpreter shutdown when module globals are liable to have been replaced by None.

  1. Apart from moving from atexit to finalize in Replace AsyncProcess exit handler by weakref.finalize #4184, the arguably even more important change is that the finalize acts on individual objects and ensures that the collection order is properly respected and objects are cleaned up as soon as they are collected and not only on interpreter shutdown.

@quasiben
Copy link
Member

@fjetter filed issue #7726 and removed the dask-cuda dependency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants