Docs: add the "How to add support for custom data types" section #4049

sphuber · 2020-05-04T18:10:30Z

csadorf

I added a few questions and suggestions.

csadorf · 2020-05-07T15:47:45Z

docs/source/datatypes/index.rst

@@ -25,7 +25,7 @@ the methods provided, how to access them, and so on.

 If you need to work with some specific type of data, first check the list of data types/plugins
 below, and if you don't find what you need, give a look to
-:ref:`how to write a new data plugin <working_data_creating_new_types>`.
+:ref:`how to write a new data plugin <how-to:data:plugin:create>`.


Should datatypes/ still be a top-level directory?

It shouldn't, this content should be moved into the new scaffolding, but it is not in the scope of the issue addressed by this PR, so I left it and just made sure the link is already correct. Having said that, I realize now that there currently is not really an issue that addresses these sections. This content contains detailed information on explicit data types shipped with aiida-core, some of which will be moved out at some point. Not sure what to do with this documentation until that time

Are those data types documented as part of the reference? In that case we can maybe just drop it. Alternatively we could place it to the end of HowTo::"How to work with data".

the content of dataypes/index.rst is really minimal and could certainly be furnished by the reference section. This would then of course rely on comprehensive docstring on the data types itself in the code, which we would have to check. The other files in that folder go more in depth, and are mini tutorials on how to work with the more advanced data types. But this concerns mostly the data types that are materials science specific and so will be moved out at some point. I think Giovanni would prefer not to get rid of this documentation yet so maybe we discuss with him during the meeting what we do with them for the time being

As discussed offline with @csadorf , at the end of the documentation restructuring we will look where to put the content of datatypes/ and move it somewhere appropriate, see #4064

docs/source/howto/data.rst

csadorf · 2020-05-07T15:59:56Z

docs/source/howto/data.rst

+-----------------------
+
+When deciding where to store a property of a data type, one has to choose between the database and the file repository.
+The database will make it possible to search in the provenance graph based on criteria based on the property, e.g. all ``NewData`` nodes where the property is greater than 0.


This sentence is a bit confusing. What do you mean by "criteria based on the property"? What is "the property"? Is "property" a property of a node?

I agree that it is not the clearest. I find it a bit difficult to explain in words without becoming too technical. Let me give the technical version and then we can see together if we can come up with a better description for the documentation.

All nodes are stored in the database in the node table. The node table has multiple columns that store the properties of each node. Most of these are properties that exist for all node types, e.g., pk, UUID, label, ctime, mtime and description. But we also need a way to store data type specific data, which is what the attributes column is for, which can contain any object in JSON format, so it can store any key-value pair as long as the value is serializable. So the data type specific content is stored in the "attributes" (which become immutable once the node is stored.). You can see that in AiIDA's context, a node's attributes is quite specific, as it is literally the content of the attribute column. That is why I tried to use "properties" (instead of attributes) as a more generic term to describe the "things that define the data node".

So finally, what I mean with property is "property" of something in the generic sense, not in the Python sense, but in the way that the UUID of a node is a property. What I mean with "searching based on criteria based on the property" is simply that clearly, when storing a property of a data node in the database (as opposed to the repository) one can query for it. Not sure how clear it is to the average user that storing the data in the database enables querying. They might not have encountered the query builder at this point in time.

How about this?

All node properties, including attributes that are stored in the database, are directly searchable as part of a database query whereas data stored in the file repository provides no introspection. What this means is that, for example, it is possible to search for all nodes where a particular database-stored integer attribute falls into a certain value range, but the same value stored in a file within the file repository would not be directly searchable in this way.

With a few modifications I think that could work:

All node properties that are stored in the database (such as the attributes), are directly searchable as part of a database query, whereas data stored in the file repository cannot be queried for.
What this means is that, for example, it is possible to search for all nodes where a particular database-stored integer attribute falls into a certain value range, but the same value stored in a file within the file repository would not be directly searchable in this way.

docs/source/howto/data.rst

csadorf · 2020-05-07T16:04:19Z

docs/source/howto/data.rst

+The downside is that storing too much information in the database can make it sluggish.
+Therefore, big data (think large files), whose content does not necessarily need to be queried for, is better stored in the file repository.
+Of course a data type may need to store multiple properties of varying character, but both storage locations can safely be used in parellel.
+When choosing the database as the storage location, the properties should be stored using the node *attributes*.


Suggested change

When choosing the database as the storage location, the properties should be stored using the node *attributes*.

Properties stored as part of the node's *attributes* are stored in the database.

Is that right? I'm trying to simplify the sentence, but I'm not 100% sure whether I got cause and effect right.

Technically it is correct yes. What I think is missing here is a direct reference to the actual methods of the Node class. Without those, the term attributes is too vague I think. Maybe I can add a sentence like:

The node class has various methods to set its attributes, such as :py:`~aiida.orm.node.Node.set_attribute` and :py:`~aiida.orm.node.Node.set_attribute_many`

Maybe you can provide an example that references those methods?

csadorf self-requested a review May 6, 2020 14:48

csadorf reviewed May 7, 2020

View reviewed changes

sphuber added 2 commits May 7, 2020 19:35

Docs: add the "How to add support for custom data types" section

2f728c1

Comments from PR review

60d8bc9

sphuber force-pushed the fix/3995/docs-howto-plugin branch from 9f0570c to 60d8bc9 Compare May 8, 2020 10:37

csadorf approved these changes May 8, 2020

View reviewed changes

sphuber merged commit c90485c into aiidateam:docs-revamp May 8, 2020

sphuber deleted the fix/3995/docs-howto-plugin branch May 8, 2020 13:13

sphuber mentioned this pull request May 11, 2020

Docs: How to add support for custom data types #3995

Closed

csadorf pushed a commit that referenced this pull request May 29, 2020

Docs: add the "How to add support for custom data types" section (#4049)

0560faf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: add the "How to add support for custom data types" section #4049

Docs: add the "How to add support for custom data types" section #4049

sphuber commented May 4, 2020

csadorf left a comment

csadorf May 7, 2020

sphuber May 7, 2020

csadorf May 8, 2020

sphuber May 8, 2020

sphuber May 8, 2020 •

edited

Loading

csadorf May 7, 2020

sphuber May 7, 2020

csadorf May 8, 2020

sphuber May 8, 2020

csadorf May 7, 2020

sphuber May 7, 2020

csadorf May 8, 2020

	When choosing the database as the storage location, the properties should be stored using the node attributes.
	Properties stored as part of the node's attributes are stored in the database.

Docs: add the "How to add support for custom data types" section #4049

Docs: add the "How to add support for custom data types" section #4049

Conversation

sphuber commented May 4, 2020

csadorf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber May 8, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber May 8, 2020 •

edited

Loading