Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump_session fails with from numpy import * #79

Closed
mmckerns opened this issue Jan 26, 2015 · 11 comments · Fixed by #101
Closed

dump_session fails with from numpy import * #79

mmckerns opened this issue Jan 26, 2015 · 11 comments · Fixed by #101

Comments

@mmckerns
Copy link
Member

Importing numpy in this common way leads to pickle failures.

from numpy import *
import dill
dill.dump_session()

Specifically:

pickle.PicklingError: Can't pickle <type 'numpy.uint64'>: it's not the same object as numpy.uint64

and very very probably several others.

This is a bit of a big request to add all of the numpy top-level objects, but it's a very common case.

@mmckerns
Copy link
Member Author

splintered off from #78. Adding @charris.

@mmckerns
Copy link
Member Author

I'd suggest pickling everything via import numpy and not using from numpy import *, however this is a really common usage pattern.

@mmckerns
Copy link
Member Author

(@nikohansen) note that this works:

>>> import dill
>>> import numpy
>>> dill.loads(dill.dumps(numpy.uint64))
<type 'numpy.uint64'>

and this woks:

>>> from numpy import *
>>> dill.loads(dill.dumps(uint64))
<type 'numpy.uint64'>

however, dump_session fails

@nikohansen
Copy link

Just to emphasize my users perspective: my main concern is not that numpy objects fail to pickle. I can from numpy import * do again, no problem. My main concern is that the exception prevents pickling what is actually relevant to me in the workspace.

@mmckerns
Copy link
Member Author

With trace turned on, we see some odd behavior:

>>> from numpy import *
>>> dill.detect.trace(True)
>>> dill.dump_session()
M1: <module '__main__' (built-in)>
F2: <function _import_module at 0x1099b9c80>
D2: <dict object at 0x1099e0398>
F2: <function disp at 0x1097165f0>
F2: <function union1d at 0x109757b18>
F2: <function all at 0x10936fcf8>
F2: <function issubsctype at 0x109283758>
F2: <function savez at 0x10979f2a8>
F2: <function atleast_2d at 0x1093b9140>
B2: <built-in function restoredot>
F2: <function ptp at 0x10936fe60>
T4: <type 'numpy.unicode_'>
F2: <function resize at 0x10936f668>
F2: <function blackman at 0x109716938>
T4: <class 'numpy.core.getlimits.iinfo'>
T4: <type 'numpy.busdaycalendar'>
F2: <function pkgload at 0x108f3e230>
T4: <type 'numpy.void'>
M2: <module 'numpy.core.records' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/records.pyc'>
F2: <function tri at 0x1096fe848>
F2: <function prod at 0x1093700c8>
F2: <function array_equal at 0x109372938>
B2: <built-in function scalar>
T4: <type 'numpy.dtype'>
F2: <function indices at 0x109372488>
B2: <built-in function loads>
B2: <built-in function set_numeric_ops>
F2: <function pmt at 0x10979fd70>
F2: <function nanstd at 0x109727ed8>
Nu: <ufunc 'cosh'>
T4: <type 'numpy.object_'>
F2: <function argpartition at 0x10936f398>
T4: <class 'numpy.lib.index_tricks.IndexExpression'>
D2: <dict object at 0x1097265c8>
F2: <function append at 0x10971c398>
B2: <built-in function seterrobj>
F2: <function nanargmax at 0x1097279b0>
Nu: <ufunc 'power'>
T4: <class 'numpy.core.numerictypes._typedict'>
T4: <type 'numpy.uint64'>
F1: <function <lambda> at 0x1092839b0>
F2: <function _create_function at 0x1099b9410>
Co: <code object <lambda> at 0x10926b430, file "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numerictypes.py", line 867>
F2: <function _unmarshal at 0x1099b92a8>
D4: <dict object at 0x109263910>
D2: <dict object at 0x1099e1168>
T4: <type 'numpy.int64'>
F1: <function <lambda> at 0x109283a28>
D4: <dict object at 0x109263910>
D2: <dict object at 0x1099ed398>
F1: <function <lambda> at 0x109283aa0>
D4: <dict object at 0x109263910>
D2: <dict object at 0x108bbb4b0>
T4: <type 'numpy.datetime64'>
F1: <function <lambda> at 0x109283b18>
D4: <dict object at 0x109263910>
D2: <dict object at 0x1099e0050>
T4: <type 'numpy.uint64'>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 233, in dump_session
    pickler.dump(main_module)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 956, in save_module
    state=_main_dict)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 419, in save_reduce
    save(state)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 664, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems
    save(v)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 416, in save_reduce
    self._batch_setitems(dictitems)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 680, in _batch_setitems
    save(k)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.3.dev0-py2.7.egg/dill/dill.py", line 1000, in save_type
    StockPickler.save_global(pickler, obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 753, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type 'numpy.uint64'>: it's not the same object as numpy.uint64

numpy.uint64 seems to pickle once, then fails when it gets passed in the second time around.

@abrasive
Copy link
Contributor

This is a fun one. Numpy's scalar type system generates a bunch of types automatically, distinguishing for example between ULONG and ULONGLONG - which are the same on 64-bit systems, and end up with the same name but different type objects:

>>> from numpy import *
>>> uint64
<type 'numpy.uint64'>
>>> id(uint64)
3212084976704
>>> ulonglong
<type 'numpy.uint64'>
>>> id(ulonglong)
3212084976256

These then also get dragged into all of the Numpy type conversion apparatus, like numpy.cast.

@binaryfunt
Copy link

This is still happening for me with dill 0.2.8.2

>>> import dill
>>> from numpy import *
>>> dill.dump_session("foo.pkl")
...
PicklingError: Can't pickle <class 'numpy.int32'>: it's not the same object as numpy.int32

And even though a foo.pkl gets created, I can't load it from a new session

>>> import dill
>>> dill.load_session("foo.pkl")
---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
<ipython-input-2-8263e37cb94d> in <module>
----> 1 dill.load_session('foo.pkl')

~\Miniconda3\lib\site-packages\dill\_dill.py in load_session(filename, main)
    400         unpickler._main = main
    401         unpickler._session = True
--> 402         module = unpickler.load()
    403         unpickler._session = False
    404         main.__dict__.update(module.__dict__)

EOFError: Ran out of input

@mmckerns
Copy link
Member Author

mmckerns commented Nov 22, 2018

@binaryfunt: Try it with byref=True.

>>> import dill
>>> from numpy import *
>>> dill.dump_session('foo.pkl', byref=True)

Then in a new session:

>>> import dill
>>> dill.load_session('foo.pkl')
>>> int64  
<type 'numpy.int64'>
>>> 

@shepware
Copy link

shepware commented Jul 4, 2019

I'm having the same problem on this thread; I use 'from numpy import *' because I love Travis Oliphant's original work, read his original documentation, and followed his recommendation to use NumPy in this way.

(Rest of comment deleted)

@mmckerns
Copy link
Member Author

mmckerns commented Jul 5, 2019

@shepware: if you are getting an error, what you've reported is not sufficient information:

The latest suggestion gives a new one on the dump_session code:
RuntimeError: dictionary changed size during iteration

I don't see this error in any of the environments and versions that I test on, so I'd need information on OS and versions of python, numpy, dill, etc you are using.

Feel free to open this in a new ticket, and if it's a duplicate, I'll manage that.

Secondly... I worked in finance, and to help facilitate storing complex objects on disk on in a database, I created klepto. It provides a dict interface and a function caching decorator interface. It might be useful for you.

@shepware
Copy link

shepware commented Jul 5, 2019

Thanks I’ll check that out. Deleting my original questions so as not to completely throw IEX under the bus for mission abandonment on democratizing Wall Street, or fill up this thread with non-replicable errors.

@mmckerns mmckerns added this to the dill-0.3.1 milestone Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants