Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale reference needs a gc.collect to be collected in cloudpickle_fast #327

Closed
pierreglaser opened this issue Jan 21, 2020 · 3 comments · Fixed by #334
Closed

stale reference needs a gc.collect to be collected in cloudpickle_fast #327

pierreglaser opened this issue Jan 21, 2020 · 3 comments · Fixed by #334

Comments

@pierreglaser
Copy link
Member

We noticed test failures in joblib/loky#229 using Python3.8. After some investigations, this bug can be reproduced with cloudpickle only: apparently, using Python3.8 and the new reducer_override mechanism, something holds a stale reference until a gc.collect call is performed:

import threading
import weakref
import gc
import cloudpickle


class MyObject(object):
    def __init__(self, value=0):
        self.value = value

    def __repr__(self):
        return "MyObject({})".format(self.value)

    def my_method(self):
        pass


my_object = MyObject()
collect = threading.Event()
_ = weakref.ref(my_object, lambda obj: collect.set())  # noqa
cloudpickle.dumps(my_object)

del my_object

# Uncomment this line and the reference will be collected
# gc.collect()

collected = False
for i in range(5):
    collected = collect.wait(timeout=1.0)
    if collected:
        break
assert collected, "Stale reference not collected within timeout."
@pierreglaser pierreglaser changed the title stale reference need a gc.collect to be collected in cloudpickle_fast stale reference needs a gc.collect to be collected in cloudpickle_fast Jan 21, 2020
@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 23, 2020

I found out the issue. To understand its origin, first note that this:

class MyObject:
    def my_method(self):
        pass


my_object = MyObject()
my_object.looped_method = my_object.my_method

Generates a cycle. my_object.my_method generates on the fly a new bound instancemethod that did not exist before, from the function my_method present in MyObject's __dict__ (that's how methods work in Python).

So my_object.my_method is a (new) object holding a reference to my_object, and is put inside my_object's __dict__ in the last line (through the setattr call), generating a cycle.

Now, for performance reasons, this exact pattern shows up in the Python3.8's _pickle.c module with the new reducer_override method. Thus, a CloudPickler object will only be collected after a gc.collect event is triggered. The problem is that a CloudPickler object holds a memo attribute, that itself holds a reference to any object that was serialized by the said CloudPickler, (including, in the reproducer, my_object). So my_object won't be deleted as long as the underlying Cloudpickler that serialized it is not deleted, which right now, will not happen before a full collection operation is done by the gc. Hence this bug.

cc @ogrisel @tomMoral (because it originally happened in loky)

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 29, 2020

As mentioned by @ogrisel, we can try to break the reference cycle by attempting to remove the reducer_override method from the CloudPickler, which is the attribute at the origin of the cycle.
The problem is that the cycle is only accessible at the C-level: reducer_override is part of the Pickler's struct, and not a dynamic attribute stored inside the Pickler's __dict__.

I did not find any clean way do update reducer override. The least ugly way I found is this CloudPickler._cleanup method:

    def _cleanup(self):
        # break the reference cycle involving the current pickler because of it
        # directly holding a reference to a bound reducer_override method at
        # the C-level. This is done  by re-setting reducer_override to a faulty
        # function not referencing the current pickler, and triggering the
        # C-level update using a smoke dump call, made to error out without
        # writing anything into the previous file object.
        def _reducer_override_placeholder(obj):
            raise RuntimeError

        def _f():
            pass

        self.reducer_override = _reducer_override_placeholder
        try:
            Pickler.dump(self, _f)
        except RuntimeError:
            pass

which should be invoked after a dump call. With this patch, the test suite seems to pass, as well as the reproducer above.

However, the fix for this problem in CPython is trivial, and could easily be added for Python3.8.2 (or even Python3.8.1). I wonder if it's worth adding such a complex method relying on undocumented, optimization specificities only for a bugfix release.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jan 29, 2020

I created an issue in bugs.python.org and submitted a patch (python/cpython#18266)

EDIT: This issue is now fixed on Python upstream, and will be present in the next Python 3.8.x.

pierreglaser added a commit to pierreglaser/cloudpickle that referenced this issue Feb 4, 2020
pierreglaser added a commit to pierreglaser/cloudpickle that referenced this issue Feb 4, 2020
pierreglaser added a commit to pierreglaser/cloudpickle that referenced this issue Feb 4, 2020
pierreglaser added a commit to pierreglaser/cloudpickle that referenced this issue Feb 4, 2020
pierreglaser added a commit to pierreglaser/cloudpickle that referenced this issue Feb 4, 2020
pierreglaser added a commit to pierreglaser/cloudpickle that referenced this issue Feb 4, 2020
ogrisel pushed a commit that referenced this issue Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant