Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic classes not created in __main__ fail to pickle #56

Open
mattja opened this issue Jul 17, 2014 · 21 comments
Open

dynamic classes not created in __main__ fail to pickle #56

mattja opened this issue Jul 17, 2014 · 21 comments

Comments

@mattja
Copy link

mattja commented Jul 17, 2014

I would like to generate new classes programatically at runtime using type(), then serialize these (for use on other computers on a cluster). Is there a way to get dill to pickle these?

Strangely, the classes seem to pickle ok if this is done by the __main__ module, but not any other module. minimal test:

classmaker.py:

import dill

def f():
    cls = type('NewCls', (object,), dict())
    print(dill.pickles(cls))

if __name__ == "__main__":
    f()

consumer.py:

import classmaker
classmaker.f()

running these:

$ python classmaker.py
True
$ python consumer.py
False

In the second case the pickling exception is: Can't pickle <class 'classmaker.NewCls'>: it's not found as classmaker.NewCls

@matsjoyce
Copy link
Contributor

Yeah, it one of the problems with the pickle approach. When it finds a class that's in a module, it assumes that is can be reloaded "on the other side". Dill adjusts this to completely pickle all "things" in __main__, but leaves pickle to do it's stuff everywhere else. As a work around you could do:

def f():
    global NewCls
    cls = type('NewCls', (object,), dict())
    NewCls = cls
    print(dill.pickles(cls))

or change cls.__module__ to __main__.

@mmckerns We could adjust dill.dill.save_type to first try to find the type, then if it can be found, pass it on to pickle, else pickle it as if it were in __main__?

@matsjoyce
Copy link
Contributor

In a way, this issue is similar to #52 and therefore #1, in that the obvious fix is to treat modules like __main__.

@mmckerns
Copy link
Member

These should be no different than named tuples, which "everyone knows" the instance needs to be named appropriately.

The other part of this issue is having the instance creation done inside a class... and python misidentifies the namespace as being inside the file object -- this is one of the outstanding problems remaining for serialization, and solving it should solve a number of objects, I believe.

I think it's safe to ignore the first part (naming) for now, as it is convention.

@mattja
Copy link
Author

mattja commented Jul 17, 2014

Thanks. Trying out these workarounds, inserting the class in globals() allowed dill to find
the class. But as it is pickled by ref, the unpickling fails on the remote host.

The second workaround, setting cls.__module__ = '__main__' solved the problem for the simplified test case.
When I apply this workaround in my real program, I get RuntimeError: Maximum recursion depth exceeded thrown during pickling (everything?) in __main__. I'll take a closer look tomorrow to see what's going on.

@mmckerns
Copy link
Member

Feel free to post a minimal version of your program than reproduces the traceback you are seeing. For certain objects, it is currently possible to instruct dill to not pickle by reference. dill currently does this for user-defined classes (built in the traditional sense using a class def), and this could probably be extended to turning off pickling by reference for classes built as above.

@mmckerns mmckerns changed the title dynamically created classes in non-main module dynamic classes not created in __main__ fail to pickle Jul 17, 2014
@matsjoyce
Copy link
Contributor

dill.detect.trace may be useful for that.

@mmckerns
Copy link
Member

Good point. @mattja: there are a few functions in dill.detect that can be used to investigate where and how things fail to pickle.

@mattja
Copy link
Author

mattja commented Jul 22, 2014

Faking cls.__module__ = '__main__' seems to be a good temporary workaround for this issue.
It is reliably forcing the classes to be pickled/unpickled in full.

The further errors which resulted seem to be a separate issue so I've filed the details in #58.

@jakirkham
Copy link

@mmckerns, in this case, qualname appears to fail on Python 2. However, the failure is a little surprising as it seems to expect __qualname__ is defined. Apparently, there is some code path where it doesn't actually set the value for this attribute.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-03af44776c0f> in <module>()
----> 1 qualname.qualname(c)

/opt/conda/lib/python2.7/site-packages/qualname.pyc in qualname(obj)
     53             _, lineno = inspect.getsourcelines(obj)
     54         except (OSError, IOError):
---> 55             return obj.__qualname__  # raises a sensible error
     56     elif inspect.isfunction(obj) or inspect.ismethod(obj):
     57         if hasattr(obj, 'im_func'):

AttributeError: type object 'NewCls' has no attribute '__qualname__'

I have raised an issue containing more detail about this problem upstream. ( wbolster/qualname#2 )

@mmckerns
Copy link
Member

mmckerns commented Nov 9, 2015

@jakirkham: No, as I mentioned in the other thread, qualname can't handle a lot of cases… what you are seeing is the author's choice that when inspect fails to fabricate a line of code that will throw an AttributeErrror with reference to __qualname__.

@mmckerns
Copy link
Member

mmckerns commented Nov 9, 2015

I think the behavior of qualname can be significantly enhanced by the code already in dill.source, and as I mentioned in the other thread, at quick glance it looks like dill.source handles more cases including the capacity to work in __main__. I think the easier route is to leverage the visitor pattern object used in qualname, and then replace most of the rest of the code with what's already in dill.source.

@jakirkham
Copy link

That's fine. I just wanted to see if qualname would already work for this case. The answer is no. Whether one considers this a bugfix or feature request for qualname is an orthogonal issue. If you have the framework to get this working @mmckerns, I am still in favor of your original proposal.

@mmckerns
Copy link
Member

mmckerns commented Nov 9, 2015

FYI, the above is the same as in the discussion from the other thread -- and, yes, I think the easier fix is to do so within dill.source.

@wbolster
Copy link

wbolster commented Dec 4, 2015

qualname maintainer here. fixes and improvements to https://github.com/wbolster/qualname are welcome as long as its generic enough, since qualname is basically just a backport of python 3 functionality.

@mmckerns
Copy link
Member

mmckerns commented Dec 7, 2015

Hi @wbolster, as you can see from this thread, this is an issue that has been around for a few years. Actually, it's been known (to at least me) for almost the entire life of dill (10 years).

I've known the "backport" solution (basically, what is done in qualname and thus what is done in python 3) since __qualname__ appeared in python 3 -- however, I wasn't able to resolve the issues that qualname avoids… so I don't have code in dill to handle it at all yet.

dill.source enables source introspection (like the inspect module), with support for objects defined in __main__ as well as extracting source (or imports) for dependencies so an object can be rebuilt from the source. A blend of some of the functions available in dill.source that can dig into an object's source code with the root code from qualname, I believe, would cover many of the cases that qualname currently fails on. At this point, I'm not suggesting augmenting qualname, but instead extending dill using dill.source and some of the code inside qualname… however, if I find that it'd be better to go the inverse route (extending qualname not dill.source) I will submit a patch when I get to this issue.

@jakirkham
Copy link

FWIW, it appears that cloudpickle solves these problems. So, it may be worth looking at.

@mmckerns
Copy link
Member

mmckerns commented Feb 2, 2016

@jakirkham: Thanks. I am very familiar with cloudpickle, and am aware of it's abilities and limitations in this vein.

@mmckerns
Copy link
Member

See interesting mapping done in: python/typeshed#24

@Peque
Copy link
Contributor

Peque commented Jul 14, 2017

@mmckerns Is it easier to fix this for Python 3 only? And if so, would it make sense at least for now?

Maybe due to the wider Python 3 adoption and the fact that Python 2 will be discontinued "soon" (in less than 3 years), concentrating efforts in Python 3 is not a bad idea specially for the cases where Python 2 makes things more complicated (?).

@mmckerns
Copy link
Member

@Peque: yes, I believe the fix for 3.x is much easier. There have been a lot of changes since I dug into this last, and several older versions (<= 2.6, and <= 3.4) are basically unsupported by the python community... so it may be much easier of a task than previously.

william-silversmith added a commit to seung-lab/python-task-queue that referenced this issue Feb 23, 2019
william-silversmith added a commit to seung-lab/python-task-queue that referenced this issue Feb 24, 2019
@mmckerns
Copy link
Member

mmckerns commented Jul 7, 2022

This should be revisited, especially since #413 happened. Note similarity of #507, and community workarounds such as this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants