Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy arrays cannot be persisted in WorkChain context #3941

Closed
zhubonan opened this issue Apr 17, 2020 · 3 comments
Closed

numpy arrays cannot be persisted in WorkChain context #3941

zhubonan opened this issue Apr 17, 2020 · 3 comments

Comments

@zhubonan
Copy link
Contributor

zhubonan commented Apr 17, 2020

The context of a WorkChain is serialized when checkpointed and deserialized on demand, e.g. when the daemon is restarted. At the moment only a few python objects can be used in the context, mainly the those that are json compatible. For example, lists/dictionary of strings and integers. AiiDA data structures can be serialized because there are loaders and dumpers defined in serialize.py.

However, it would be good if numpy arrays can be used. At the moment they are serializble but cannot be deserialized. As a result, any workchain with numpy array in the context will except when the daemon attempts to deserialise the checkpoint when it restarts.

This problem can be reproduced with the following snipplet:

import numpy as np
from aiida.orm.utils.serialize import serialize, deserialize
arr = np.arange(10)
sarr = serialize(arr)


deserialize(sarr) 

This will raise an exception:

/site-packages/yaml/constructor.py in make_python_instance(self, suffix, node, args, kwds, newobj, unsafe)
    568             raise ConstructorError("while constructing a Python instance", node.start_mark,
    569                     "expected a class, but found %r" % type(cls),
--> 570                     node.start_mark)
    571         if newobj and isinstance(cls, type):
    572             return cls.__new__(cls, *args, **kwds)

ConstructorError: while constructing a Python instance
expected a class, but found <class 'builtin_function_or_method'>
  in "<unicode string>", line 1, column 1:
    !!python/object/apply:numpy.core ...

Perhaps the solution is to add additional loaders to serialize.py?

OK, it seems that using yaml.unsafe_load(sarr) works. See also #3675.

@zhubonan
Copy link
Contributor Author

Find a simple solution - make AiiDALoader a subclass of yaml.UnsafeLoader rather than yaml.FullLoader. As the name suggests, this makes deserialize unsafe as abitrary code can be executed.

But I think it is reasonable to assume the checkpoints persisted in the database safe? Of course, a potential attack vector can be someone break into my database, find my ongoing processes and make daemon do strange stuff (only when it restarts, given that the tempered checkpoint is not overwritten in the meantime)

@sphuber
Copy link
Contributor

sphuber commented Apr 17, 2020

We were already discussing with @greschd in #3709 to just switch to unsafe loading for checkpoints in order to support more object types. We also discussed the safety of it and like you said there are potential attack vectors. Currently one does not even have to break into your database, simply creating a malicious export and having you import this will suffice. We think however that just always ignoring checkpoint attributes when importing. Anyway this should never happen because that means the process wasn't yet terminated when it was exported, but just to be safe we drop it if we find any.

@zhubonan
Copy link
Contributor Author

zhubonan commented Apr 17, 2020

Oops, sorry I should be checked #3709 out! Closing this issue as it is duplicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants