-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: add the "How to add support for custom data types" section #4049
Docs: add the "How to add support for custom data types" section #4049
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few questions and suggestions.
@@ -25,7 +25,7 @@ the methods provided, how to access them, and so on. | |||
|
|||
If you need to work with some specific type of data, first check the list of data types/plugins | |||
below, and if you don't find what you need, give a look to | |||
:ref:`how to write a new data plugin <working_data_creating_new_types>`. | |||
:ref:`how to write a new data plugin <how-to:data:plugin:create>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should datatypes/
still be a top-level directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't, this content should be moved into the new scaffolding, but it is not in the scope of the issue addressed by this PR, so I left it and just made sure the link is already correct. Having said that, I realize now that there currently is not really an issue that addresses these sections. This content contains detailed information on explicit data types shipped with aiida-core
, some of which will be moved out at some point. Not sure what to do with this documentation until that time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those data types documented as part of the reference? In that case we can maybe just drop it. Alternatively we could place it to the end of HowTo::"How to work with data".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the content of dataypes/index.rst
is really minimal and could certainly be furnished by the reference section. This would then of course rely on comprehensive docstring on the data types itself in the code, which we would have to check. The other files in that folder go more in depth, and are mini tutorials on how to work with the more advanced data types. But this concerns mostly the data types that are materials science specific and so will be moved out at some point. I think Giovanni would prefer not to get rid of this documentation yet so maybe we discuss with him during the meeting what we do with them for the time being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/source/howto/data.rst
Outdated
----------------------- | ||
|
||
When deciding where to store a property of a data type, one has to choose between the database and the file repository. | ||
The database will make it possible to search in the provenance graph based on criteria based on the property, e.g. all ``NewData`` nodes where the property is greater than 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a bit confusing. What do you mean by "criteria based on the property"? What is "the property"? Is "property" a property of a node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is not the clearest. I find it a bit difficult to explain in words without becoming too technical. Let me give the technical version and then we can see together if we can come up with a better description for the documentation.
All nodes are stored in the database in the node table. The node table has multiple columns that store the properties of each node. Most of these are properties that exist for all node types, e.g., pk, UUID, label, ctime, mtime and description. But we also need a way to store data type specific data, which is what the attributes
column is for, which can contain any object in JSON format, so it can store any key-value pair as long as the value is serializable. So the data type specific content is stored in the "attributes" (which become immutable once the node is stored.). You can see that in AiIDA's context, a node's attributes is quite specific, as it is literally the content of the attribute column. That is why I tried to use "properties" (instead of attributes) as a more generic term to describe the "things that define the data node".
So finally, what I mean with property is "property" of something in the generic sense, not in the Python sense, but in the way that the UUID of a node is a property. What I mean with "searching based on criteria based on the property" is simply that clearly, when storing a property of a data node in the database (as opposed to the repository) one can query for it. Not sure how clear it is to the average user that storing the data in the database enables querying. They might not have encountered the query builder at this point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
All node properties, including attributes that are stored in the database, are directly searchable as part of a database query whereas data stored in the file repository provides no introspection. What this means is that, for example, it is possible to search for all nodes where a particular database-stored integer attribute falls into a certain value range, but the same value stored in a file within the file repository would not be directly searchable in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a few modifications I think that could work:
All node properties that are stored in the database (such as the attributes), are directly searchable as part of a database query, whereas data stored in the file repository cannot be queried for.
What this means is that, for example, it is possible to search for all nodes where a particular database-stored integer attribute falls into a certain value range, but the same value stored in a file within the file repository would not be directly searchable in this way.
docs/source/howto/data.rst
Outdated
The downside is that storing too much information in the database can make it sluggish. | ||
Therefore, big data (think large files), whose content does not necessarily need to be queried for, is better stored in the file repository. | ||
Of course a data type may need to store multiple properties of varying character, but both storage locations can safely be used in parellel. | ||
When choosing the database as the storage location, the properties should be stored using the node *attributes*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When choosing the database as the storage location, the properties should be stored using the node *attributes*. | |
Properties stored as part of the node's *attributes* are stored in the database. |
Is that right? I'm trying to simplify the sentence, but I'm not 100% sure whether I got cause and effect right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it is correct yes. What I think is missing here is a direct reference to the actual methods of the Node
class. Without those, the term attributes
is too vague I think. Maybe I can add a sentence like:
The node class has various methods to set its attributes, such as :py:`~aiida.orm.node.Node.set_attribute` and :py:`~aiida.orm.node.Node.set_attribute_many`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can provide an example that references those methods?
9f0570c
to
60d8bc9
Compare
Fixes #3995