Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: add the "How to add support for custom data types" section #4049

Merged
merged 2 commits into from
May 8, 2020

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented May 4, 2020

Fixes #3995

@csadorf csadorf self-requested a review May 6, 2020 14:48
Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few questions and suggestions.

@@ -25,7 +25,7 @@ the methods provided, how to access them, and so on.

If you need to work with some specific type of data, first check the list of data types/plugins
below, and if you don't find what you need, give a look to
:ref:`how to write a new data plugin <working_data_creating_new_types>`.
:ref:`how to write a new data plugin <how-to:data:plugin:create>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should datatypes/ still be a top-level directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't, this content should be moved into the new scaffolding, but it is not in the scope of the issue addressed by this PR, so I left it and just made sure the link is already correct. Having said that, I realize now that there currently is not really an issue that addresses these sections. This content contains detailed information on explicit data types shipped with aiida-core, some of which will be moved out at some point. Not sure what to do with this documentation until that time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those data types documented as part of the reference? In that case we can maybe just drop it. Alternatively we could place it to the end of HowTo::"How to work with data".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the content of dataypes/index.rst is really minimal and could certainly be furnished by the reference section. This would then of course rely on comprehensive docstring on the data types itself in the code, which we would have to check. The other files in that folder go more in depth, and are mini tutorials on how to work with the more advanced data types. But this concerns mostly the data types that are materials science specific and so will be moved out at some point. I think Giovanni would prefer not to get rid of this documentation yet so maybe we discuss with him during the meeting what we do with them for the time being

Copy link
Contributor Author

@sphuber sphuber May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline with @csadorf , at the end of the documentation restructuring we will look where to put the content of datatypes/ and move it somewhere appropriate, see #4064

docs/source/howto/data.rst Outdated Show resolved Hide resolved
docs/source/howto/data.rst Outdated Show resolved Hide resolved
docs/source/howto/data.rst Outdated Show resolved Hide resolved
-----------------------

When deciding where to store a property of a data type, one has to choose between the database and the file repository.
The database will make it possible to search in the provenance graph based on criteria based on the property, e.g. all ``NewData`` nodes where the property is greater than 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is a bit confusing. What do you mean by "criteria based on the property"? What is "the property"? Is "property" a property of a node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is not the clearest. I find it a bit difficult to explain in words without becoming too technical. Let me give the technical version and then we can see together if we can come up with a better description for the documentation.

All nodes are stored in the database in the node table. The node table has multiple columns that store the properties of each node. Most of these are properties that exist for all node types, e.g., pk, UUID, label, ctime, mtime and description. But we also need a way to store data type specific data, which is what the attributes column is for, which can contain any object in JSON format, so it can store any key-value pair as long as the value is serializable. So the data type specific content is stored in the "attributes" (which become immutable once the node is stored.). You can see that in AiIDA's context, a node's attributes is quite specific, as it is literally the content of the attribute column. That is why I tried to use "properties" (instead of attributes) as a more generic term to describe the "things that define the data node".

So finally, what I mean with property is "property" of something in the generic sense, not in the Python sense, but in the way that the UUID of a node is a property. What I mean with "searching based on criteria based on the property" is simply that clearly, when storing a property of a data node in the database (as opposed to the repository) one can query for it. Not sure how clear it is to the average user that storing the data in the database enables querying. They might not have encountered the query builder at this point in time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

All node properties, including attributes that are stored in the database, are directly searchable as part of a database query whereas data stored in the file repository provides no introspection. What this means is that, for example, it is possible to search for all nodes where a particular database-stored integer attribute falls into a certain value range, but the same value stored in a file within the file repository would not be directly searchable in this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a few modifications I think that could work:

All node properties that are stored in the database (such as the attributes), are directly searchable as part of a database query, whereas data stored in the file repository cannot be queried for.
What this means is that, for example, it is possible to search for all nodes where a particular database-stored integer attribute falls into a certain value range, but the same value stored in a file within the file repository would not be directly searchable in this way.

docs/source/howto/data.rst Outdated Show resolved Hide resolved
docs/source/howto/data.rst Outdated Show resolved Hide resolved
The downside is that storing too much information in the database can make it sluggish.
Therefore, big data (think large files), whose content does not necessarily need to be queried for, is better stored in the file repository.
Of course a data type may need to store multiple properties of varying character, but both storage locations can safely be used in parellel.
When choosing the database as the storage location, the properties should be stored using the node *attributes*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When choosing the database as the storage location, the properties should be stored using the node *attributes*.
Properties stored as part of the node's *attributes* are stored in the database.

Is that right? I'm trying to simplify the sentence, but I'm not 100% sure whether I got cause and effect right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it is correct yes. What I think is missing here is a direct reference to the actual methods of the Node class. Without those, the term attributes is too vague I think. Maybe I can add a sentence like:

The node class has various methods to set its attributes, such as :py:`~aiida.orm.node.Node.set_attribute` and :py:`~aiida.orm.node.Node.set_attribute_many`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can provide an example that references those methods?

@sphuber sphuber force-pushed the fix/3995/docs-howto-plugin branch from 9f0570c to 60d8bc9 Compare May 8, 2020 10:37
@sphuber sphuber merged commit c90485c into aiidateam:docs-revamp May 8, 2020
@sphuber sphuber deleted the fix/3995/docs-howto-plugin branch May 8, 2020 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants