Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardised serialisation/deserialisation of data nodes #5464

Open
chrisjsewell opened this issue Mar 23, 2022 · 4 comments
Open

Standardised serialisation/deserialisation of data nodes #5464

chrisjsewell opened this issue Mar 23, 2022 · 4 comments

Comments

@chrisjsewell
Copy link
Member

For use cases such as the web API and declarative workflows, it would be desirable to have a standardised way to construct/deconstruct data nodes, in a round-trippable and language agnostic manner.
In addition, it would also be helpful for the node to declare a schema for the serialisation, e.g. for client-side validation (this is similar to how https://graphql.org/ works).

For a simple data type, this might be something like:

from aiida.orm import Data, User
import jsonschema


class Int(Data):

    @staticmethod
    @property
    def schema():
        return {
            'type': 'object',
            'properties': {
                'value': {'type': 'integer'},
                'user_email': {'type': 'string'},
            }
        }

    @classmethod
    def deserialize(cls, data: dict):
        jsonschema.validate(data, cls.schema)
        return cls(data['value'], user=User.objects.get(email=data['user_email']))

    def serialize(self):
        return {'value': self.value, 'user_email': self.user.email}

Note the schema does not necassarily have to have a one-to-one correspondence with how the data is actually stored, e.g. in the attributes field

For data types that store (large) binary data, i.e. in the repository, this is a bit more tricky, since (a) that is not strictly JSONable, and (b) we would want to stream this data, rather than read it into memory.
This may require some kind of extension to standard JSON schema.

@chrisjsewell
Copy link
Member Author

cc @louisponet

@louisponet
Copy link

louisponet commented Mar 24, 2022

I think it would also be useful to have the schemas themselves in a json somewhere, so the other languages can use them too. Maybe there can then be a generic .schema function that simply loads the corresponding file?

Of course if everything goes through a restapi, that can also simply be requested through a url

@chrisjsewell
Copy link
Member Author

I think it would also be useful to have the schemas themselves in a json somewhere

Possibly, but this would be difficult to enforce for plugins, plus having to keep these in a standard place.

(as you edited 😉 )
the idea with Node.schema is that you would "serve" them via a web API, with the backend running aiida in Python, then the frontend running whatever language

@csadorf csadorf self-assigned this Jun 23, 2022
@csadorf
Copy link
Contributor

csadorf commented Jul 11, 2022

For reference: we should consider to use MessagePack as a more efficient serialization format (compared to JSON); also supports binary data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants