Skip to content

Latest commit

 

History

History
672 lines (420 loc) · 29.6 KB

plugin_codes.rst

File metadata and controls

672 lines (420 loc) · 29.6 KB

How to write a plugin for an external code

Tip

Before starting to write a new plugin, check the aiida plugin registry. If a plugin for your code is already available, you can skip straight to :ref:`how-to:run-codes`.

Tip

This how to walks you through all logical steps of how AiiDA interacts with an external code. If you already know the basics and would like to get started with a new plugin package quickly, check out :ref:`how-to:plugins-develop`.

To run an external code with AiiDA, you need a corresponding calculation plugin, which tells AiiDA how to:

  1. Prepare the required input files.
  2. Run the code with the correct command line parameters.

Finally, you will probably want a parser plugin, which tells AiiDA how to:

  1. Parse the output of the code.

This how-to takes you through the process of :ref:`creating a calculation plugin<how-to:plugin-codes:interfacing>`, using it to :ref:`run the code<how-to:plugin-codes:run>`, and :ref:`writing a parser <how-to:plugin-codes:parsing>` for its outputs.

In this example, our |Code| will be the diff executable that "computes" the difference between two "input files" and prints the difference to standard output:

$ cat file1.txt
file with content
content1

$ cat file2.txt
file with content
content2

$ diff file1.txt file2.txt
2c2
< content1
---
> content2

We are using diff here since it is available on almost every UNIX system by default, and it takes both command line arguments (the two files) and command line options (e.g. -i for case-insensitive matching). This is similar to how the executables of many scientific simulation codes work, making it easy to adapt this example to your use case.

We will run diff as:

$ diff file1.txt file2.txt > diff.patch

thus writing difference between file1.txt and file2.txt to diff.patch.

Interfacing external codes

Start by creating a file calculations.py and subclass the |CalcJob| class:

from aiida.common import datastructures
from aiida.engine import CalcJob
from aiida.orm import SinglefileData

class DiffCalculation(CalcJob):
    """AiiDA calculation plugin wrapping the diff executable."""

In the following, we will tell AiiDA how to run our code by implementing two key methods:

  1. :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.define`
  2. :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.prepare_for_submission`

Defining the spec

The |define| method tells AiiDA which inputs the |CalcJob| expects and which outputs it produces (exit codes will be :ref:`discussed later<how-to:plugin-codes:parsing:errors>`). This is done through an instance of the :py:class:`~aiida.engine.processes.process_spec.CalcJobProcessSpec` class, which is passed as the spec argument to the |define| method. For example:

.. literalinclude:: ../../../aiida/calculations/diff_tutorial/calculations.py
    :language: python
    :pyobject: DiffCalculation.define


The first line of the method calls the |define| method of the |CalcJob| parent class. This necessary step defines the inputs and outputs that are common to all |CalcJob|'s.

Next, we use the :py:meth:`~plumpy.process_spec.ProcessSpec.input` method in order to define our two input files file1 and file2 of type |SinglefileData|.

Further reading

When using |SinglefileData|, AiiDA keeps track of the inputs as files. This is very flexible but has the downside of making it difficult to query for information contained in those files and ensuring that the inputs are valid. :ref:`how-to:plugin-codes:cli-options` shows how to use the |Dict| class to represent the diff command line options as a python dictionary. The aiida-diff demo plugin goes further and adds automatic validation.

We then use :py:meth:`~plumpy.process_spec.ProcessSpec.output` to define the only output of the calculation with the label diff. AiiDA will attach the outputs defined here to a (successfully) finished calculation using the link label provided.

Finally, we set a few default options, such as the name of the parser (which we will implement later), the name of input and output files, and the computational resources to use for such a calculation. These options have already been defined on the spec by the super().define(spec) call, and they can be accessed through the :py:attr:`~plumpy.process_spec.ProcessSpec.inputs` attribute, which behaves like a dictionary.

There is no return statement in define: the define method directly modifies the spec object it receives.

Note

One more input required by any |CalcJob| is which external executable to use.

External executables are represented by |Code| instances that contain information about the computer they reside on, their path in the file system and more. They are passed to a |CalcJob| via the code input, which is defined in the |CalcJob| base class, so you don't have to:

spec.input('code', valid_type=orm.AbstractCode, help='The `Code` to use for this job.')

Further reading

For more details on setting up your inputs and outputs (covering validation, dynamic number of inputs, etc.) see the :ref:`Defining Processes <topics:processes:usage:defining>` topic.

Preparing for submission

The :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.prepare_for_submission` method has two jobs: Creating the input files in the format the external code expects and returning a :py:class:`~aiida.common.datastructures.CalcInfo` object that contains instructions for the AiiDA engine on how the code should be run. For example:

.. literalinclude:: ../../../aiida/calculations/diff_tutorial/calculations.py
    :language: python
    :pyobject: DiffCalculation.prepare_for_submission

All inputs provided to the calculation are validated against the spec before |prepare_for_submission| is called. Therefore, when accessing the :py:attr:`~plumpy.processes.Process.inputs` attribute, you can safely assume that all required inputs have been set and that all inputs have a valid type.

We start by creating a |CodeInfo| object that lets AiiDA know how to run the code, i.e. here:

$ diff file1.txt file2.txt > diff.patch

This includes the command line parameters (here: the names of the files that we would like to diff) and the UUID of the |Code| to run. Since diff writes directly to standard output, we redirect standard output to the specified output filename.

Next, we create a |CalcInfo| object that lets AiiDA know which files to copy back and forth. In our example, the two input files are already stored in the AiiDA file repository and we can use the local_copy_list to pass them along.

Note

In other use cases you may need to create new files on the fly. This is what the folder argument of :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.prepare_for_submission` is for:

with folder.open("filename", 'w') as handle:
    handle.write("file content")

Any files and directories created in this sandbox folder will automatically be transferred to the compute resource where the actual calculation takes place.

The retrieve_list on the other hand tells the engine which files to retrieve from the directory where the job ran after it has finished. All files listed here will be store in a |FolderData| node that is attached as an output node to the calculation with the label retrieved.

Finally, we pass the |CodeInfo| to a |CalcInfo| object. One calculation job can involve more than one executable, so codes_info is a list. If you have more than one executable in your codes_info, you can set codes_run_mode to specify the mode with which these will be executed (CodeRunMode.SERIAL by default). We define the retrieve_list of filenames that the engine should retrieve from the directory where the job ran after it has finished. The engine will store these files in a |FolderData| node that will be attached as an output node to the calculation with the label retrieved.

Further reading

There are :ref:`other file lists available<topics:calculations:usage:calcjobs:file_lists>` that allow you to easily customize how to move files to and from the remote working directory in order to prevent the creation of unnecessary copies. For more details on the |CalcJob| class, refer to the Topics section on :ref:`defining calculations <topics:calculations:usage>`.

Parsing the outputs

Parsing the output files produced by a code into AiiDA nodes is optional, but it can make your data queryable and therefore easier to access and analyze.

To create a parser plugin, subclass the |Parser| class in a file called parsers.py.

.. literalinclude::  ../../../aiida/parsers/plugins/diff_tutorial/parsers.py
    :language: python
    :start-after: # START PARSER HEAD
    :end-before: # END PARSER HEAD

Before the parse() method is called, two important attributes are set on the |Parser| instance:

  1. self.retrieved: An instance of |FolderData|, which points to the folder containing all output files that the |CalcJob| instructed to retrieve, and provides the means to :py:meth:`~aiida.orm.nodes.repository.NodeRepository.open` any file it contains.
  2. self.node: The :py:class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode` representing the finished calculation, which, among other things, provides access to all of its inputs (self.node.inputs).

Now implement its :py:meth:`~aiida.parsers.parser.Parser.parse` method as

.. literalinclude:: ../../../aiida/parsers/plugins/diff_tutorial/parsers.py
    :language: python
    :pyobject: DiffParserSimple.parse

The :py:meth:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode.get_option` convenience method is used to get the filename of the output file.

Finally, the :py:meth:`~aiida.parsers.parser.Parser.out` method is used return the output file as the diff output of the calculation: The first argument is the name to be used as the label for the link that connects the calculation and data node. The second argument is the node that should be recorded as an output.

Note

The outputs and their types need to match those from the process specification of the corresponding |CalcJob| (or an exception will be raised).

In this minimalist example, there isn't actually much parsing going on -- we are simply passing along the output file as a |SinglefileData| node. If your code produces output in a structured format, instead of just returning the file you may want to parse it e.g. to a python dictionary (|Dict| node) to make the results easily searchable.

Exercise

Consider the different output files produced by your favorite simulation code. Which information would you want to:

  1. parse into the database for querying (e.g. as |Dict|, |StructureData|, ...)?
  2. store in the AiiDA file repository for safe-keeping (e.g. as |SinglefileData|, ...)?
  3. leave on the computer where the calculation ran (e.g. recording their remote location using |RemoteData| or simply ignoring them)?

Once you know the answers to these questions, you are ready to start writing a parser for your code.

In order to request automatic parsing of a |CalcJob| (once it has finished), users can set the metadata.options.parser_name input when launching the job. If a particular parser should be used by default, the |CalcJob| define method can set a default value for the parser name as was done in the :ref:`previous section <how-to:plugin-codes:interfacing>`:

@classmethod
def define(cls, spec):
    ...
    spec.inputs['metadata']['options']['parser_name'].default = 'diff-tutorial'

Note that the default is not set to the |Parser| class itself, but to the entry point string under which the parser class is registered. We will register the entry point for the parser in a bit.

Handling parsing errors

So far, we have not spent much attention on dealing with potential errors that can arise when running external codes. However, there are lots of ways in which codes can fail to execute nominally. A |Parser| can play an important role in detecting and communicating such errors, where :ref:`workflows <how-to:run-workflows>` can then decide how to proceed, e.g., by modifying input parameters and resubmitting the calculation.

Parsers communicate errors through :ref:`exit codes<topics:processes:concepts:exit_codes>`, which are defined in the spec of the |CalcJob| they parse. The DiffCalculation example, defines the following exit code:

spec.exit_code(300, 'ERROR_MISSING_OUTPUT_FILES', message='Calculation did not produce all expected output files.')

An exit_code defines:

  • an exit status (a positive integer, following the :ref:`topics:processes:usage:exit_code_conventions`),
  • a label that can be used to reference the code in the |parse| method (through the self.exit_codes property, as shown below), and
  • a message that provides a more detailed description of the problem.

In order to inform AiiDA about a failed calculation, simply return from the parse method the exit code that corresponds to the detected issue. Here is a more complete version of the example |Parser| presented in the previous section:

.. literalinclude:: ../../../aiida/parsers/plugins/diff_tutorial/parsers.py
    :language: python
    :pyobject: DiffParser.parse

This simple check makes sure that the expected output file diff.patch is among the files retrieved from the computer where the calculation was run. Production plugins will often scan further aspects of the output (e.g. the standard error, the output file, etc.) for any issues that may indicate a problem with the calculation and return a corresponding exit code.

AiiDA stores the exit code returned by the |parse| method on the calculation node that is being parsed, from where it can then be inspected further down the line (see the :ref:`defining processes <topics:processes:usage:defining>` topic for more details). Note that some scheduler plugins can detect issues at the scheduler level (by parsing the job scheduler output) and set an exit code. The Topics section on :ref:`scheduler exit codes <topics:calculations:usage:calcjobs:scheduler-errors>` explains how these can be inspected inside a parser and how they can optionally be overridden.

Customizations

Process label

Each time a Process is run, a ProcessNode is stored in the database to record the execution. A human-readable label is stored in the process_label attribute. By default, the name of the process class is used as this label. If this default is not informative enough, it can be customized by overriding the :meth:`~aiida.engine.processes.process.Process._build_process_label`: method:

class SomeProcess(Process):

    def _build_process_label(self):
        return 'custom_process_label'

Nodes created through executions of this process class will have node.process_label == 'custom_process_label'.

Registering entry points

:ref:`Entry points <how-to:plugins-develop:entrypoints>` are the preferred method of registering new calculation, parser and other plugins with AiiDA.

With your calculations.py and parsers.py files at hand, let's register entry points for the plugins they contain:

  • Move your two scripts into a subfolder aiida_diff_tutorial:

    $ mkdir aiida_diff_tutorial
    $ mv calculations.py parsers.py aiida_diff_tutorial/
    $ touch aiida_diff_tutorial/__init__.py

    You have just created an aiida_diff_tutorial Python package!

  • Add a minimal set of metadata for your package by writing a pyproject.toml file:

    [build-system]
    # build the package with [flit](https://flit.readthedocs.io)
    requires = ["flit_core >=3.4,<4"]
    build-backend = "flit_core.buildapi"
    
    [project]
    # See https://www.python.org/dev/peps/pep-0621/
    name = "aiida-diff-tutorial"
    version = "0.1.0"
    description = "AiiDA demo plugin"
    dependencies = [
        "aiida-core>=2.0,<3",
    ]
    
    [project.entry-points."aiida.calculations"]
    "diff-tutorial" = "aiida_diff_tutorial.calculations:DiffCalculation"
    
    [project.entry-points."aiida.parsers"]
    "diff-tutorial" = "aiida_diff_tutorial.parsers:DiffParser"
    
    [tool.flit.module]
    name = "aiida_diff_tutorial"

    Note

    This allows for the project metadata to be fully specified in the pyproject.toml file, using the PEP 621 format.

  • Install your new aiida-diff-tutorial plugin package.

    $ pip install -e .  # install package in "editable mode"

    See the :ref:`how-to:plugins-install` section for details.

After this, you should see your plugins listed:

$ verdi plugin list aiida.calculations
$ verdi plugin list aiida.calculations diff-tutorial
$ verdi plugin list aiida.parsers

Running a calculation

With the entry points set up, you are ready to launch your first calculation with the new plugin:

  • If you haven't already done so, :ref:`set up your computer<how-to:run-codes:computer>`. In the following we assume it to be the localhost:

    $ verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w `echo $PWD/work` -n
    $ verdi computer configure core.local localhost --safe-interval 5 -n
  • Create the input files for our calculation

    $ echo -e "File with content\ncontent1" > file1.txt
    $ echo -e "File with content\ncontent2" > file2.txt
    $ mkdir input_files
    $ mv file1.txt file2.txt input_files
  • Write a launch.py script:

    .. literalinclude:: ./include/snippets/plugins/launch.py
      :language: python
    
    

    Note

    The launch.py script sets up an AiiDA |Code| instance that associates the /usr/bin/diff executable with the DiffCalculation class (through its entry point diff).

    This code is automatically set on the code input port of the builder and passed as an input to the calculation plugin.

  • Launch the calculation:

    $ verdi run launch.py

    If everything goes well, this should print the results of your calculation, something like:

    $ verdi run launch.py
    Computed diff between files:
    2c2
    < content1
    ---
    > content2

Tip

If you encountered a parsing error, it can be helpful to make a :ref:`topics:calculations:usage:calcjobs:dry_run`, which allows you to inspect the input folder generated by AiiDA before any calculation is launched.

Finally instead of running your calculation in the current shell, you can submit your calculation to the AiiDA daemon:

  • (Re)start the daemon to update its Python environment:

    $ verdi daemon restart --reset
  • Update your launch script to use:

    # Submit calculation to the aiida daemon
    node = engine.submit(builder)
    print("Submitted calculation {}".format(node))

    Note

    node is the |CalcJobNode| representing the state of the underlying calculation process (which may not be finished yet).

  • Launch the calculation:

    $ verdi run launch.py

    This should print the UUID and the PK of the submitted calculation.

You can use the verdi command line interface to :ref:`monitor<topics:processes:usage:monitoring>` this processes:

$ verdi process list -a -p1

This should show the processes of both calculations you just ran. Use verdi calcjob outputcat <pk> to check the output of the calculation you submitted to the daemon.

Congratulations - you can now write plugins for external simulation codes and use them to submit calculations!

If you still have time left, consider going through the optional exercise below.

Writing importers for existing computations

.. versionadded:: 2.0

New users to your plugin may often have completed many previous computations without the use of AiiDA, which they wish to import into AiiDA. In these cases, it is possible to write an importer for their inputs/outputs, which generates the provenance nodes for the corresponding |CalcJob|.

The importer must be written as a subclass of :class:`~aiida.engine.processes.calcjobs.importer.CalcJobImporter`, for an example see :class:`aiida.calculations.importers.arithmetic.add.ArithmeticAddCalculationImporter`.

To associate the importer with the |CalcJob| class, the importer must be registered with an entry point in the group aiida.calculations.importers.

[project.entry-points."aiida.calculations.importers"]
"core.arithmetic.add" = "aiida.calculations.importers.arithmetic.add:ArithmeticAddCalculationImporter"

Note

Note that the entry point name can be any valid entry point name. If the importer plugin is provided by the same package as the corresponding |CalcJob| plugin, it is recommended that the entry point name of the importer and |CalcJob| plugin are the same. This will allow the :meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.get_importer` method to automatically fetch the associated importer. If the entry point names differ, the entry point name of the desired importer implementation needs to be passed to :meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.get_importer` as an argument.

Users can then import their calculations via the :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.get_importer` method:

from aiida.plugins import CalculationFactory

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
importer = ArithmeticAddCalculation.get_importer()
remote_data = RemoteData('/some/absolute/path', computer=load_computer('computer'))
inputs = importer.parse_remote_data(remote_data)
results, node = run.get_node(ArithmeticAddCalculation, **inputs)
assert node.is_imported
.. seealso:: :doc:`aep:004_calcjob_importer/readme`, for the design considerations around this feature.

Exercise - Support command-line options

As discussed before, diff knows a couple of command-line options:

$ diff --help
Usage: diff [OPTION]... FILES
Compare files line by line.
...
-i, --ignore-case               ignore case differences in file contents
-E, --ignore-tab-expansion      ignore changes due to tab expansion
-b, --ignore-space-change       ignore changes in the amount of white space
-w, --ignore-all-space          ignore all white space
-B, --ignore-blank-lines        ignore changes where lines are all blank
-I, --ignore-matching-lines=RE  ignore changes where all lines match RE
...

For simplicity let's focus on the excerpt of options shown above and allow the user of our plugin to pass these along.

Notice that one of the options (--ignore-matching-lines) requires the user to pass a regular expression string, while the other options don't require any value.

One way to represent a set of command line options like

diff --ignore-case --ignore-matching-lines='.*ABC.*'

would be using a python dictionary:

parameters = {
  'ignore-case': True,
  'ignore-space-change': False,
  'ignore-matching-lines': '.*ABC.*'
 }

Here is a simple code snippet for translating the dictionary to a list of command line options:

def cli_options(parameters):
     """Return command line options for parameters dictionary.

     :param dict parameters: dictionary with command line parameters
     """
     options = []
     for key, value in parameters.items():
         # Could validate: is key a known command-line option?
         if isinstance(value, bool) and value:
             options.append(f'--{key}')
         elif isinstance(value, str):
             # Could validate: is value a valid regular expression?
             options.append(f'--{key}')
             options.append(value)

     return options

Note

When passing parameters along to your simulation code, try validating them. This detects errors directly at submission of the calculation and thus prevents calculations with malformed inputs from ever entering the queue of your HPC system.

For the sake of brevity we are not performing validation here but there are numerous python libraries, such as voluptuous (used by aiida-diff, see example), marshmallow or pydantic, that help you define a schema to validate input against.

Let's open our previous calculations.py file and start modifying the DiffCalculation class:

  1. In the define method, add a new input to the spec with label 'parameters' and type |Dict| (from aiida.orm import Dict)

  2. In the prepare_for_submission method run the cli_options function from above on self.inputs.parameters.get_dict() to get the list of command-line options.
    Add them to the codeinfo.cmdline_params.
.. dropdown:: Solution

   For 1. add the following line to the ``define`` method:

   .. code-block:: python

        spec.input('parameters', valid_type=Dict, help='diff command-line parameters')

   For 2. copy the ``cli_options`` snippet at the end of ``calculations.py`` and set the ``cmdline_params`` to:

   .. code:: python

        codeinfo.cmdline_params = cli_options(self.inputs.parameters.get_dict()) + [ self.inputs.file1.filename, self.inputs.file2.filename]


That's it. Let's now open the launch.py script and pass along our command line parameters:

...
builder.parameters = orm.Dict(dict={'ignore-case': True})
...

Change the capitalization of one of the characters in the first line of file1.txt. Then, restart the daemon and submit the new calculation:

$ verdi daemon restart
$ verdi run launch.py

If everything worked as intended, the capitalization difference in the first line should be ignored (and thus not show up in the output).

This marks the end of this how-to.

The |CalcJob| and |Parser| plugins are still rather basic and the aiida-diff-tutorial plugin package is missing a number of useful features, such as package metadata, documentation, tests, CI, etc. Continue with :ref:`how-to:plugins-develop` in order to learn how to quickly create a feature-rich new plugin package from scratch.