Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Move in section on QueryBuilder to How-To #4080

Merged
merged 4 commits into from
May 28, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
267 changes: 261 additions & 6 deletions docs/source/howto/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,7 @@ Here is an example for a custom data type that needs to wrap a single text file:

To create a new instance of this data type and get its content:

.. code:: python

.. code-block:: python
node = TextFileData(filepath='/some/absolute/path/to/file.txt')
node.get_content() # This will return the content of the file

Expand All @@ -248,11 +247,267 @@ The node class has various methods to set these attributes, such as :py:`~aiida.

.. _how-to:data:find:

Finding data
============
Finding and querying for data
=============================

Once you have successfully completed a series of workflows for your project, or have imported a dataset you are interested in, you want to quickly find the data that is relevant for your analysis.
The data in an AiiDA database is stored as a graph of connected entities, which can be easily *queried* with the :class:`~aiida.orm.querybuilder.QueryBuilder` class.

The :class:`~aiida.orm.querybuilder.QueryBuilder` lets you query your AiiDA database independently of the backend used under the hood.
Before starting to write a query, it helps to:

* Know what you want to query for.
In the language of databases, you need to tell the backend what *entity* you are looking for and optionally which of its properties you want to *project*.
For example, you might be interested in the label of a calculation and the PK's of all its outputs.
* Know the relationships between entities you are interested in.
Nodes of an AiiDA graph (vertices) are connected with links (edges).
A node can for example be either the input or output of another node, but also an ancestor or a descendant.
* Know how you want to filter the results of your query.

Once you are clear about what you want and how you can get it, the :class:`~aiida.orm.querybuilder.QueryBuilder` will build an SQL-query for you.

There are two ways of using the :class:`~aiida.orm.querybuilder.QueryBuilder`:

#. In the *appender* method, you construct your query step by step using the ``QueryBuilder.append()`` method.
#. In the *queryhelp* approach, you construct a dictionary that defines your query and pass it to the :class:`~aiida.orm.querybuilder.QueryBuilder`.

Both APIs provide the same functionality - the appender method may be more suitable for interactive use, e.g., in the ``verdi shell``, whereas the queryhelp method can be useful in scripting.
In this section we will focus on the basics of the appender method.
For more advanced queries or more details on the queryhelp, see the :ref:`topics section on advanced querying <topics:database:advancedquery>`.

.. _how-to:data:find:select:

Selecting entities
------------------

Using the ``append()`` method of the :class:`~aiida.orm.querybuilder.QueryBuilder`, you can query for the entities you are interested in.
Suppose you want to query for calculation job nodes in your database:

.. code-block:: python

from aiida.orm.querybuilder import QueryBuilder
qb = QueryBuilder() # Instantiating instance. One instance -> one query
mbercx marked this conversation as resolved.
Show resolved Hide resolved
qb.append(CalcJobNode) # Setting first vertex of path

If you are interested in instances of different classes, you can also pass an iterable of classes.
However, they have to be of the same ORM-type (e.g. all have to be subclasses of :class:`~aiida.orm.nodes.node.Node`):

.. code-block:: python

qb = QueryBuilder() # Instantiating instance. One instance -> one query
qb.append([CalcJobNode, WorkChainNode]) # Setting first vertice of path, either WorkChainNode or Job.

.. note::

Processes have both a run-time :class:`~aiida.engine.processes.process.Process` that executes them and a :class:`~aiida.orm.nodes.node.Node` that stores their data in the database (see the :ref:`corresponding topics section<topics:processes:concepts:types>` for a detailed explanation).
The :class:`~aiida.orm.querybuilder.QueryBuilder` allows you to pass either the :class:`~aiida.orm.nodes.node.Node` class (e.g. :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`) or the :class:`~aiida.engine.processes.process.Process` class (e.g. :class:`~aiida.engine.processes.calcjobs.calcjob.CalcJob`), which will automatically select the right entity for the query.
Using either :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode` or :class:`~aiida.engine.processes.calcjobs.calcjob.CalcJob` will produce the same query results.

.. _how-to:data:find:results:

Retrieving results
------------------

Once you have *appended* the entity you want to query for to the :class:`~aiida.orm.querybuilder.QueryBuilder`, the next question is how to get the results.
There are several ways to obtain data from a query:

.. code-block:: python

qb = QueryBuilder() # Instantiating instance
qb.append(CalcJobNode) # Setting first vertice of path

first_row = qb.first() # Returns a list (!) of the results of the first row

all_results_d = qb.dict() # Returns all results as a list of dictionaries

all_results_l = qb.all() # Returns a list of lists

In case you are working with a large dataset, you can also return your query as a generator:

.. code-block:: python

all_res_d_gen = qb.iterdict() # Return a generator of dictionaries
# of all results
all_res_l_gen = qb.iterall() # Returns a generator of lists
mbercx marked this conversation as resolved.
Show resolved Hide resolved

This will retrieve the data in batches, and you can start working with the data before the query has completely finished.
For example, you can iterate over the results of your query in a for loop:

.. code-block:: python

for entry in qb.iterall():
# do something with a single entry in the query result

.. _how-to:data:find:filters:

Filters
-------

Usually you do not want to query for *all* entities of a certain class, but rather *filter* the results based on certain properties.
Suppose you do not want all :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode` data, but only those that are ``finished``:

.. code-block:: python

qb = QueryBuilder() # Initialize a QueryBuilder instance
qb.append(
CalcJobNode, # Append a CalcJobNode
filters={ # Specify the filters:
'attributes.process_state': 'finished', # the process is finished
},
)

You can apply multiple filters to one entity in a query.
Say you are interested in all calculation jobs in your database that are ``finished`` **and** have ``exit_status == 0``:

.. code-block:: python

qb = QueryBuilder() # Initialize a QueryBuilder instance
qb.append(
CalcJobNode, # Append a CalcJobNode
filters={ # Specify the filters:
'attributes.process_state': 'finished', # the process is finished AND
'attributes.exit_status': 0 # has exit_status == 0
},
)

In case you want to query for calculation jobs that satisfy one of these conditions, you can use the ``or`` operator:

.. code-block:: python

qb = QueryBuilder()
qb.append(
CalcJobNode,
filters={
'or':[
{'attributes.process_state': 'finished'},
{'attributes.exit_status': 0}
]
},
)

If we had written ``and`` instead of ``or`` in the example above, we would have performed the exact same query as the previous one, because ``and`` is the default behavior if you provide several filters as key-value pairs in a dictionary to the ``filters`` argument.
In case you want all calculation jobs with state ``finished`` or ``excepted``, you can also use the ``in`` operator:

.. code-block:: python

qb = QueryBuilder()
qb.append(
CalcJobNode,
filters={
'attributes.process_state': {'in': ['finished', 'excepted']}
},
)

You can negate a filter by adding an exclamation mark in front of the operator.
So, to query for all calculation jobs that are not a ``finished`` or ``excepted`` state:

.. code-block:: python

qb = QueryBuilder()
qb.append(
CalcJobNode,
filters={
'attributes.process_state': {'!in': ['finished', 'excepted']}
},
)

.. note::

The above rule applies to all operators.
For example, you can check non-equality with ``!==``, since this is the equality operator (``==``) with a negation prepended.

A complete list of all available operators can be found in the :ref:`advanced querying section<topics:database:advancedquery:tables>`.

.. _how-to:data:find:relationships:

Relationships
-------------

It is possible to query for data based on its relationship to another entity in the database.
Imagine you are not interested in the calculation jobs themselves, but in one of the outputs they create.
You can build upon your initial query for all :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`'s in the database using the relationship of the output to the first step in the query:

.. code-block::

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')

In the first ``append`` call, we query for all :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`'s in the database, and *tag* this step with the *unique* identifier ``'calcjob'``.
Next, we look for all ``Int`` nodes that are an output of the :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`'s found in the first step, using the ``with_incoming`` relationship argument.
The ``Int`` node was created by the :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode` and as such has an *incoming* create link.

In the context of our query, we are building a *path* consisting of *vertices* (i.e. the entities we query for) connected by *edges* defined by the relationships between them.
The complete set of all possible relationships you can use query for, as well as the entities that they connect to, can be found in the :ref:`advanced querying section<topics:database:advancedquery:tables>`.

.. note::

The ``tag`` identifier can be any alphanumeric string, it is simply a label used to refer to a previous vertex along the query path when defining a relationship.

.. _how-to:data:find:projections:

Projections
-----------
mbercx marked this conversation as resolved.
Show resolved Hide resolved

By default, the :class:`~aiida.orm.querybuilder.QueryBuilder` returns the instances of the entities corresponding to the final append to the query path.
For example:

.. code-block:: python

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')

The above code snippet will return all ``Int`` nodes that are outputs of any :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`.
However, you can also *project* other entities in the path by adding ``project='*'`` to the corresponding ``append()`` call:

.. code-block:: python

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob', project='*')
qb.append(Int, with_incoming='calcjob')

This will return all :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`'s that have an ``Int`` output node.

However, in many cases we are not interested in the entities themselves, but rather their PK, UUID, *attributes* or some other piece of information stored by the entity.
This can be achieved by providing the corresponding *column* to the ``project`` keyword argument:

.. code-block:: python

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob', project='id')

In the above example, executing the query returns all *PK's* of the ``Int`` nodes which are outputs of all :class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode`'s in the database.
Moreover, you can project more than one piece of information for one vertex by providing a list:

.. code-block:: python

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob', project=['id', '*', 'attributes.value'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: projecting both the instance and separate values might not make the most sense in most cases, as you can always get those properties from the instance once loaded. Note sure if it matters here, or if this slightly weird example may cause some confusion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point! The idea was to introduce project='*' to another query to show that you can use '*' like any other projection. I've moved this concept to the next query example.


For the query above, ``qb.all()`` will return a list of lists, for which each element corresponds to one entity and contains three items: the PK, instance of the ``Int`` node and its value.
Finally, you can project information for multiple vertices along the query path:

.. code-block:: python

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob', project='uuid')
qb.append(Int, with_incoming='calcjob', project=['id', 'attributes.value'])

All projections must start with one of the *columns* of the entities in the database.
Examples of columns we have encountered so far are ``id``, ``uuid`` and ``attributes``.
If the column is a dictionary, you can expand the dictionary values using a dot notation, as we have done in the previous example to obtain the ``attributes.value``.
This can be used to project the values of nested dictionaries as well.

.. note::

`#3996`_
Be aware that for consistency, ``QueryBuilder.all()`` / ``iterall()`` always returns a list of lists, even if you only project one property of a single entity.
Use ``QueryBuilder.all(flat=True)`` to return the query result as a flat list in this case.

As mentioned in the beginning, this section provides only a brief introduction to the :class:`~aiida.orm.querybuilder.QueryBuilder`'s basic functionality.
To learn about more advanced queries, please see :ref:`the corresponding topics section<topics:database:advancedquery>`.

.. _how-to:data:organize:

Expand Down Expand Up @@ -404,4 +659,4 @@ Notice that we haven't specified any port in the URLs since Apache listens conve

.. _#3996: https://github.com/aiidateam/aiida-core/issues/3996
.. _#3997: https://github.com/aiidateam/aiida-core/issues/3997
.. _#3998: https://github.com/aiidateam/aiida-core/issues/3998
.. _#3998: https://github.com/aiidateam/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an accidental change I imagine? You can restore this and actually remove the link for 3996, which is the one this PR solves

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was accidental. I must have gotten a little overzealous while slimming down the section. ^^

Loading