Replace `AsyncProcess` exit handler by weakref.finalize #4184

pentschev · 2020-10-23T13:08:13Z

This is an attempt at solving #4181 . I believe this is a more reliable approach, given we can ensure the process will not get destroyed before it's garbage collected.

quasiben · 2020-10-23T13:15:01Z

This does resolve a long standing cleanup issue for UCX and we think this is a cleaner approach to halt workers. @jcrist if you have time can you look this over ?

quasiben · 2020-10-23T13:15:36Z

@fjetter your thoughts here would also be helpful

jcrist

A few quick comments. No thoughts on whether this is the right approach over other methods - generally I'd hope we could explicitly manage closing these processes rather than requiring a finalizer to implicitly handle them.

distributed/process.py

pentschev · 2020-10-23T19:45:51Z

A few quick comments. No thoughts on whether this is the right approach over other methods - generally I'd hope we could explicitly manage closing these processes rather than requiring a finalizer to implicitly handle them.

I think the main problem we have today is that using exit handlers as globals as is the case in Distributed seems incorrect, as the order will not be respected. I haven't looked at all exit handlers but the few I looked at will all have the same problem, a couple examples:

distributed/distributed/client.py

Lines 4838 to 4850 in 18fff8b

    
           def _close_global_client(): 
        
               """ 
        
               Force close of global client.  This cleans up when a client 
        
               wasn't close explicitly, e.g. interactive sessions. 
        
               """ 
        
               c = _get_global_client() 
        
               if c is not None: 
        
                   c._should_close_loop = False 
        
                   with suppress(TimeoutError, RuntimeError): 
        
                       c.close(timeout=3) 
        
           atexit.register(_close_global_client)

distributed/distributed/deploy/spec.py

Lines 636 to 641 in 18fff8b

    
           @atexit.register 
        
           def close_clusters(): 
        
               for cluster in list(SpecCluster._instances): 
        
                   with suppress(gen.TimeoutError, TimeoutError): 
        
                       if cluster.status != Status.closed: 
        
                           cluster.close(timeout=10)

I don't know whether this is the right approach either, I was hoping to suggest this and begin a discussion on potential pitfalls and alternatives. At the moment I don't see a much better approach,

jcrist

Provided tests pass, this looks good to me.

fjetter

I agree with @jcrist and would prefer a world where we would not need to cleanup with finalizers but instead would be able to close everything intentionally.

However, as far as finalizers go I think this approach is slightly better than the one before since it deals with each object/process individually instead of in bulk. In particular, as already mentioned, it respects the ordering in the order the objects where created. Before the order was determined on module import time. Not sure if this makes any difference but this approach seems also simpler to reason about than the module level finalizer.

fjetter · 2020-10-26T09:42:56Z

There are a lot of failures on travis/py3.6. Most seem unrelated but I'm not entirely sure.

pentschev · 2020-10-26T11:50:57Z

I'm also not sure whether the 3.6 failures are related to this, it seems that they aren't, but happy to look more into that if there's some suspicion this PR is the cause.

quasiben · 2020-10-26T17:12:40Z

I restarted the 3.6 CI job

pentschev · 2020-10-26T23:18:22Z

Seems like there's still one test failing with a timeout: test_broken_worker_during_computation. By looking at builds from other PRs in https://travis-ci.org/github/dask/distributed/jobs/738345888 and https://travis-ci.org/github/dask/distributed/jobs/738412961, the same test is failing, so I believe it's not a side-effect of the changes here.

fjetter · 2020-10-29T15:14:01Z

The test_broken_worker_during_computation failure is already reported in #4173

pentschev · 2020-10-29T15:22:48Z

Thanks @fjetter , I hadn't seen that. In this case, I really think this PR seems safe.

jrbourbeau

Thanks @pentschev! (and @jcrist @fjetter for reviewing)

(cherry picked from commit 8612473)

pentschev · 2020-10-30T16:58:56Z

Thanks everyone for reviews and merging! :)

pentschev added 3 commits October 23, 2020 06:05

Replace AsyncProcess exit handler by weakref.finalize

e300b2d

Remove _cleanup_dangling from check_process_leak

8be71df

Fix reaping stray process logger message

fe1887b

jcrist reviewed Oct 23, 2020

View reviewed changes

distributed/process.py Outdated Show resolved Hide resolved

distributed/process.py Outdated Show resolved Hide resolved

Remove self reference from AsyncProcess finalizer

d61f929

Remove import atexit from process.py

f41bb3e

jcrist approved these changes Oct 23, 2020

View reviewed changes

fjetter approved these changes Oct 26, 2020

View reviewed changes

pentschev mentioned this pull request Oct 27, 2020

Bugfix release for distributed on Friday, October 30 dask/community#105

Closed

jrbourbeau approved these changes Oct 30, 2020

View reviewed changes

jrbourbeau merged commit 8612473 into dask:master Oct 30, 2020

jrbourbeau pushed a commit that referenced this pull request Oct 30, 2020

Replace AsyncProcess exit handler by weakref.finalize (#4184)

e927771

(cherry picked from commit 8612473)

gforsyth pushed a commit to gforsyth/distributed that referenced this pull request Oct 31, 2020

Replace AsyncProcess exit handler by weakref.finalize (dask#4184)

3788acd

jrbourbeau mentioned this pull request Nov 3, 2020

AsyncProcess exit handlers execute too early #4181

Closed

pentschev mentioned this pull request Nov 11, 2020

Ignore UCX errors when shutting down #4236

Merged

pentschev deleted the weakref-finalizer-asyncprocess branch November 11, 2020 17:30

pentschev mentioned this pull request Mar 29, 2023

Weakref finalizers prohibit GC #7639

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `AsyncProcess` exit handler by weakref.finalize #4184

Replace `AsyncProcess` exit handler by weakref.finalize #4184

pentschev commented Oct 23, 2020

quasiben commented Oct 23, 2020

quasiben commented Oct 23, 2020

jcrist left a comment

pentschev commented Oct 23, 2020

jcrist left a comment

fjetter left a comment

fjetter commented Oct 26, 2020

pentschev commented Oct 26, 2020

quasiben commented Oct 26, 2020

pentschev commented Oct 26, 2020

fjetter commented Oct 29, 2020

pentschev commented Oct 29, 2020

jrbourbeau left a comment

pentschev commented Oct 30, 2020

Replace AsyncProcess exit handler by weakref.finalize #4184

Replace AsyncProcess exit handler by weakref.finalize #4184

Conversation

pentschev commented Oct 23, 2020

quasiben commented Oct 23, 2020

quasiben commented Oct 23, 2020

jcrist left a comment

Choose a reason for hiding this comment

pentschev commented Oct 23, 2020

jcrist left a comment

Choose a reason for hiding this comment

fjetter left a comment

Choose a reason for hiding this comment

fjetter commented Oct 26, 2020

pentschev commented Oct 26, 2020

quasiben commented Oct 26, 2020

pentschev commented Oct 26, 2020

fjetter commented Oct 29, 2020

pentschev commented Oct 29, 2020

jrbourbeau left a comment

Choose a reason for hiding this comment

pentschev commented Oct 30, 2020

Replace `AsyncProcess` exit handler by weakref.finalize #4184

Replace `AsyncProcess` exit handler by weakref.finalize #4184