Tip
Before starting to write a new plugin, check the aiida plugin registry. If a plugin for your code is already available, you can skip straight to :ref:`how-to:run-codes`.
Tip
This how to walks you through all logical steps of how AiiDA interacts with an external code. If you already know the basics and would like to get started with a new plugin package quickly, check out :ref:`how-to:plugins-develop`.
To run an external code with AiiDA, you need a corresponding calculation plugin, which tells AiiDA how to:
- Prepare the required input files.
- Run the code with the correct command line parameters.
Finally, you will probably want a parser plugin, which tells AiiDA how to:
- Parse the output of the code.
This how-to takes you through the process of :ref:`creating a calculation plugin<how-to:plugin-codes:interfacing>`, using it to :ref:`run the code<how-to:plugin-codes:run>`, and :ref:`writing a parser <how-to:plugin-codes:parsing>` for its outputs.
In this example, our |Code| will be the diff
executable that "computes" the difference between two "input files" and prints the difference to standard output:
$ cat file1.txt
file with content
content1
$ cat file2.txt
file with content
content2
$ diff file1.txt file2.txt
2c2
< content1
---
> content2
We are using diff
here since it is available on almost every UNIX system by default, and it takes both command line arguments (the two files) and command line options (e.g. -i
for case-insensitive matching).
This is similar to how the executables of many scientific simulation codes work, making it easy to adapt this example to your use case.
We will run diff
as:
$ diff file1.txt file2.txt > diff.patch
thus writing difference between file1.txt and file2.txt to diff.patch.
Start by creating a file calculations.py
and subclass the |CalcJob| class:
from aiida.common import datastructures
from aiida.engine import CalcJob
from aiida.orm import SinglefileData
class DiffCalculation(CalcJob):
"""AiiDA calculation plugin wrapping the diff executable."""
In the following, we will tell AiiDA how to run our code by implementing two key methods:
- :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.define`
- :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.prepare_for_submission`
The |define| method tells AiiDA which inputs the |CalcJob| expects and which outputs it produces (exit codes will be :ref:`discussed later<how-to:plugin-codes:parsing:errors>`).
This is done through an instance of the :py:class:`~aiida.engine.processes.process_spec.CalcJobProcessSpec` class, which is passed as the spec
argument to the |define| method.
For example:
.. literalinclude:: ../../../aiida/calculations/diff_tutorial/calculations.py :language: python :pyobject: DiffCalculation.define
The first line of the method calls the |define| method of the |CalcJob| parent class. This necessary step defines the inputs and outputs that are common to all |CalcJob|'s.
Next, we use the :py:meth:`~plumpy.process_spec.ProcessSpec.input` method in order to define our two input files file1
and file2
of type |SinglefileData|.
Further reading
When using |SinglefileData|, AiiDA keeps track of the inputs as files.
This is very flexible but has the downside of making it difficult to query for information contained in those files and ensuring that the inputs are valid.
:ref:`how-to:plugin-codes:cli-options` shows how to use the |Dict| class to represent the diff
command line options as a python dictionary.
The aiida-diff demo plugin goes further and adds automatic validation.
We then use :py:meth:`~plumpy.process_spec.ProcessSpec.output` to define the only output of the calculation with the label diff
.
AiiDA will attach the outputs defined here to a (successfully) finished calculation using the link label provided.
Finally, we set a few default options
, such as the name of the parser (which we will implement later), the name of input and output files, and the computational resources to use for such a calculation.
These options
have already been defined on the spec
by the super().define(spec)
call, and they can be accessed through the :py:attr:`~plumpy.process_spec.ProcessSpec.inputs` attribute, which behaves like a dictionary.
There is no return
statement in define
: the define
method directly modifies the spec
object it receives.
Note
One more input required by any |CalcJob| is which external executable to use.
External executables are represented by |Code| instances that contain information about the computer they reside on, their path in the file system and more.
They are passed to a |CalcJob| via the code
input, which is defined in the |CalcJob| base class, so you don't have to:
spec.input('code', valid_type=orm.AbstractCode, help='The `Code` to use for this job.')
Further reading
For more details on setting up your inputs and outputs (covering validation, dynamic number of inputs, etc.) see the :ref:`Defining Processes <topics:processes:usage:defining>` topic.
The :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.prepare_for_submission` method has two jobs: Creating the input files in the format the external code expects and returning a :py:class:`~aiida.common.datastructures.CalcInfo` object that contains instructions for the AiiDA engine on how the code should be run. For example:
.. literalinclude:: ../../../aiida/calculations/diff_tutorial/calculations.py :language: python :pyobject: DiffCalculation.prepare_for_submission
All inputs provided to the calculation are validated against the spec
before |prepare_for_submission| is called.
Therefore, when accessing the :py:attr:`~plumpy.processes.Process.inputs` attribute, you can safely assume that all required inputs have been set and that all inputs have a valid type.
We start by creating a |CodeInfo| object that lets AiiDA know how to run the code, i.e. here:
$ diff file1.txt file2.txt > diff.patch
This includes the command line parameters (here: the names of the files that we would like to diff
) and the UUID of the |Code| to run.
Since diff
writes directly to standard output, we redirect standard output to the specified output filename.
Next, we create a |CalcInfo| object that lets AiiDA know which files to copy back and forth.
In our example, the two input files are already stored in the AiiDA file repository and we can use the local_copy_list
to pass them along.
Note
In other use cases you may need to create new files on the fly.
This is what the folder
argument of :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.prepare_for_submission` is for:
with folder.open("filename", 'w') as handle:
handle.write("file content")
Any files and directories created in this sandbox folder will automatically be transferred to the compute resource where the actual calculation takes place.
The retrieve_list
on the other hand tells the engine which files to retrieve from the directory where the job ran after it has finished.
All files listed here will be store in a |FolderData| node that is attached as an output node to the calculation with the label retrieved
.
Finally, we pass the |CodeInfo| to a |CalcInfo| object.
One calculation job can involve more than one executable, so codes_info
is a list.
If you have more than one executable in your codes_info
, you can set codes_run_mode
to specify the mode with which these will be executed (CodeRunMode.SERIAL by default).
We define the retrieve_list
of filenames that the engine should retrieve from the directory where the job ran after it has finished.
The engine will store these files in a |FolderData| node that will be attached as an output node to the calculation with the label retrieved
.
Further reading
There are :ref:`other file lists available<topics:calculations:usage:calcjobs:file_lists>` that allow you to easily customize how to move files to and from the remote working directory in order to prevent the creation of unnecessary copies. For more details on the |CalcJob| class, refer to the Topics section on :ref:`defining calculations <topics:calculations:usage>`.
Parsing the output files produced by a code into AiiDA nodes is optional, but it can make your data queryable and therefore easier to access and analyze.
To create a parser plugin, subclass the |Parser| class in a file called parsers.py
.
.. literalinclude:: ../../../aiida/parsers/plugins/diff_tutorial/parsers.py :language: python :start-after: # START PARSER HEAD :end-before: # END PARSER HEAD
Before the parse()
method is called, two important attributes are set on the |Parser| instance:
self.retrieved
: An instance of |FolderData|, which points to the folder containing all output files that the |CalcJob| instructed to retrieve, and provides the means to :py:meth:`~aiida.orm.nodes.repository.NodeRepository.open` any file it contains.self.node
: The :py:class:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode` representing the finished calculation, which, among other things, provides access to all of its inputs (self.node.inputs
).
Now implement its :py:meth:`~aiida.parsers.parser.Parser.parse` method as
.. literalinclude:: ../../../aiida/parsers/plugins/diff_tutorial/parsers.py :language: python :pyobject: DiffParserSimple.parse
The :py:meth:`~aiida.orm.nodes.process.calculation.calcjob.CalcJobNode.get_option` convenience method is used to get the filename of the output file.
Finally, the :py:meth:`~aiida.parsers.parser.Parser.out` method is used return the output file as the diff
output of the calculation:
The first argument is the name to be used as the label for the link that connects the calculation and data node.
The second argument is the node that should be recorded as an output.
Note
The outputs and their types need to match those from the process specification of the corresponding |CalcJob| (or an exception will be raised).
In this minimalist example, there isn't actually much parsing going on -- we are simply passing along the output file as a |SinglefileData| node. If your code produces output in a structured format, instead of just returning the file you may want to parse it e.g. to a python dictionary (|Dict| node) to make the results easily searchable.
Exercise
Consider the different output files produced by your favorite simulation code. Which information would you want to:
- parse into the database for querying (e.g. as |Dict|, |StructureData|, ...)?
- store in the AiiDA file repository for safe-keeping (e.g. as |SinglefileData|, ...)?
- leave on the computer where the calculation ran (e.g. recording their remote location using |RemoteData| or simply ignoring them)?
Once you know the answers to these questions, you are ready to start writing a parser for your code.
In order to request automatic parsing of a |CalcJob| (once it has finished), users can set the metadata.options.parser_name
input when launching the job.
If a particular parser should be used by default, the |CalcJob| define
method can set a default value for the parser name as was done in the :ref:`previous section <how-to:plugin-codes:interfacing>`:
@classmethod
def define(cls, spec):
...
spec.inputs['metadata']['options']['parser_name'].default = 'diff-tutorial'
Note that the default is not set to the |Parser| class itself, but to the entry point string under which the parser class is registered. We will register the entry point for the parser in a bit.
So far, we have not spent much attention on dealing with potential errors that can arise when running external codes. However, there are lots of ways in which codes can fail to execute nominally. A |Parser| can play an important role in detecting and communicating such errors, where :ref:`workflows <how-to:run-workflows>` can then decide how to proceed, e.g., by modifying input parameters and resubmitting the calculation.
Parsers communicate errors through :ref:`exit codes<topics:processes:concepts:exit_codes>`, which are defined in the spec
of the |CalcJob| they parse.
The DiffCalculation
example, defines the following exit code:
spec.exit_code(300, 'ERROR_MISSING_OUTPUT_FILES', message='Calculation did not produce all expected output files.')
An exit_code
defines:
- an exit status (a positive integer, following the :ref:`topics:processes:usage:exit_code_conventions`),
- a label that can be used to reference the code in the |parse| method (through the
self.exit_codes
property, as shown below), and - a message that provides a more detailed description of the problem.
In order to inform AiiDA about a failed calculation, simply return from the parse
method the exit code that corresponds to the detected issue.
Here is a more complete version of the example |Parser| presented in the previous section:
.. literalinclude:: ../../../aiida/parsers/plugins/diff_tutorial/parsers.py :language: python :pyobject: DiffParser.parse
This simple check makes sure that the expected output file diff.patch
is among the files retrieved from the computer where the calculation was run.
Production plugins will often scan further aspects of the output (e.g. the standard error, the output file, etc.) for any issues that may indicate a problem with the calculation and return a corresponding exit code.
AiiDA stores the exit code returned by the |parse| method on the calculation node that is being parsed, from where it can then be inspected further down the line (see the :ref:`defining processes <topics:processes:usage:defining>` topic for more details). Note that some scheduler plugins can detect issues at the scheduler level (by parsing the job scheduler output) and set an exit code. The Topics section on :ref:`scheduler exit codes <topics:calculations:usage:calcjobs:scheduler-errors>` explains how these can be inspected inside a parser and how they can optionally be overridden.
Each time a Process
is run, a ProcessNode
is stored in the database to record the execution.
A human-readable label is stored in the process_label
attribute.
By default, the name of the process class is used as this label.
If this default is not informative enough, it can be customized by overriding the :meth:`~aiida.engine.processes.process.Process._build_process_label`: method:
class SomeProcess(Process):
def _build_process_label(self):
return 'custom_process_label'
Nodes created through executions of this process class will have node.process_label == 'custom_process_label'
.
:ref:`Entry points <how-to:plugins-develop:entrypoints>` are the preferred method of registering new calculation, parser and other plugins with AiiDA.
With your calculations.py
and parsers.py
files at hand, let's register entry points for the plugins they contain:
- Move your two scripts into a subfolder
aiida_diff_tutorial
:
$ mkdir aiida_diff_tutorial
$ mv calculations.py parsers.py aiida_diff_tutorial/
$ touch aiida_diff_tutorial/__init__.py
You have just created an aiida_diff_tutorial
Python package!
- Add a minimal set of metadata for your package by writing a
pyproject.toml
file:
[build-system]
# build the package with [flit](https://flit.readthedocs.io)
requires = ["flit_core >=3.4,<4"]
build-backend = "flit_core.buildapi"
[project]
# See https://www.python.org/dev/peps/pep-0621/
name = "aiida-diff-tutorial"
version = "0.1.0"
description = "AiiDA demo plugin"
dependencies = [
"aiida-core>=2.0,<3",
]
[project.entry-points."aiida.calculations"]
"diff-tutorial" = "aiida_diff_tutorial.calculations:DiffCalculation"
[project.entry-points."aiida.parsers"]
"diff-tutorial" = "aiida_diff_tutorial.parsers:DiffParser"
[tool.flit.module]
name = "aiida_diff_tutorial"
Note
This allows for the project metadata to be fully specified in the pyproject.toml file, using the PEP 621 format.
- Install your new
aiida-diff-tutorial
plugin package.
$ pip install -e . # install package in "editable mode"
See the :ref:`how-to:plugins-install` section for details.
After this, you should see your plugins listed:
$ verdi plugin list aiida.calculations
$ verdi plugin list aiida.calculations diff-tutorial
$ verdi plugin list aiida.parsers
With the entry points set up, you are ready to launch your first calculation with the new plugin:
- If you haven't already done so, :ref:`set up your computer<how-to:run-codes:computer>`. In the following we assume it to be the localhost:
$ verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w `echo $PWD/work` -n
$ verdi computer configure core.local localhost --safe-interval 5 -n
- Create the input files for our calculation
$ echo -e "File with content\ncontent1" > file1.txt
$ echo -e "File with content\ncontent2" > file2.txt
$ mkdir input_files
$ mv file1.txt file2.txt input_files
- Write a
launch.py
script:
.. literalinclude:: ./include/snippets/plugins/launch.py :language: python
Note
The launch.py
script sets up an AiiDA |Code| instance that associates the /usr/bin/diff
executable with the DiffCalculation
class (through its entry point diff
).
This code is automatically set on the code
input port of the builder and passed as an input to the calculation plugin.
- Launch the calculation:
$ verdi run launch.py
If everything goes well, this should print the results of your calculation, something like:
$ verdi run launch.py
Computed diff between files:
2c2
< content1
---
> content2
Tip
If you encountered a parsing error, it can be helpful to make a :ref:`topics:calculations:usage:calcjobs:dry_run`, which allows you to inspect the input folder generated by AiiDA before any calculation is launched.
Finally instead of running your calculation in the current shell, you can submit your calculation to the AiiDA daemon:
- (Re)start the daemon to update its Python environment:
$ verdi daemon restart --reset
- Update your launch script to use:
# Submit calculation to the aiida daemon
node = engine.submit(builder)
print("Submitted calculation {}".format(node))
Note
node
is the |CalcJobNode| representing the state of the underlying calculation process (which may not be finished yet).
- Launch the calculation:
$ verdi run launch.py
This should print the UUID and the PK of the submitted calculation.
You can use the verdi command line interface to :ref:`monitor<topics:processes:usage:monitoring>` this processes:
$ verdi process list -a -p1
This should show the processes of both calculations you just ran.
Use verdi calcjob outputcat <pk>
to check the output of the calculation you submitted to the daemon.
Congratulations - you can now write plugins for external simulation codes and use them to submit calculations!
If you still have time left, consider going through the optional exercise below.
.. versionadded:: 2.0
New users to your plugin may often have completed many previous computations without the use of AiiDA, which they wish to import into AiiDA. In these cases, it is possible to write an importer for their inputs/outputs, which generates the provenance nodes for the corresponding |CalcJob|.
The importer must be written as a subclass of :class:`~aiida.engine.processes.calcjobs.importer.CalcJobImporter`, for an example see :class:`aiida.calculations.importers.arithmetic.add.ArithmeticAddCalculationImporter`.
To associate the importer with the |CalcJob| class, the importer must be registered with an entry point in the group aiida.calculations.importers
.
[project.entry-points."aiida.calculations.importers"]
"core.arithmetic.add" = "aiida.calculations.importers.arithmetic.add:ArithmeticAddCalculationImporter"
Note
Note that the entry point name can be any valid entry point name. If the importer plugin is provided by the same package as the corresponding |CalcJob| plugin, it is recommended that the entry point name of the importer and |CalcJob| plugin are the same. This will allow the :meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.get_importer` method to automatically fetch the associated importer. If the entry point names differ, the entry point name of the desired importer implementation needs to be passed to :meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.get_importer` as an argument.
Users can then import their calculations via the :py:meth:`~aiida.engine.processes.calcjobs.calcjob.CalcJob.get_importer` method:
from aiida.plugins import CalculationFactory
ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
importer = ArithmeticAddCalculation.get_importer()
remote_data = RemoteData('/some/absolute/path', computer=load_computer('computer'))
inputs = importer.parse_remote_data(remote_data)
results, node = run.get_node(ArithmeticAddCalculation, **inputs)
assert node.is_imported
.. seealso:: :doc:`aep:004_calcjob_importer/readme`, for the design considerations around this feature.
As discussed before, diff
knows a couple of command-line options:
$ diff --help
Usage: diff [OPTION]... FILES
Compare files line by line.
...
-i, --ignore-case ignore case differences in file contents
-E, --ignore-tab-expansion ignore changes due to tab expansion
-b, --ignore-space-change ignore changes in the amount of white space
-w, --ignore-all-space ignore all white space
-B, --ignore-blank-lines ignore changes where lines are all blank
-I, --ignore-matching-lines=RE ignore changes where all lines match RE
...
For simplicity let's focus on the excerpt of options shown above and allow the user of our plugin to pass these along.
Notice that one of the options (--ignore-matching-lines
) requires the user to pass a regular expression string, while the other options don't require any value.
One way to represent a set of command line options like
diff --ignore-case --ignore-matching-lines='.*ABC.*'
would be using a python dictionary:
parameters = {
'ignore-case': True,
'ignore-space-change': False,
'ignore-matching-lines': '.*ABC.*'
}
Here is a simple code snippet for translating the dictionary to a list of command line options:
def cli_options(parameters):
"""Return command line options for parameters dictionary.
:param dict parameters: dictionary with command line parameters
"""
options = []
for key, value in parameters.items():
# Could validate: is key a known command-line option?
if isinstance(value, bool) and value:
options.append(f'--{key}')
elif isinstance(value, str):
# Could validate: is value a valid regular expression?
options.append(f'--{key}')
options.append(value)
return options
Note
When passing parameters along to your simulation code, try validating them. This detects errors directly at submission of the calculation and thus prevents calculations with malformed inputs from ever entering the queue of your HPC system.
For the sake of brevity we are not performing validation here but there are numerous python libraries, such as voluptuous (used by aiida-diff, see example), marshmallow or pydantic, that help you define a schema to validate input against.
Let's open our previous calculations.py
file and start modifying the DiffCalculation
class:
- In the
define
method, add a newinput
to thespec
with label'parameters'
and type |Dict| (from aiida.orm import Dict
) - In the
prepare_for_submission
method run thecli_options
function from above onself.inputs.parameters.get_dict()
to get the list of command-line options. Add them to thecodeinfo.cmdline_params
.
.. dropdown:: Solution For 1. add the following line to the ``define`` method: .. code-block:: python spec.input('parameters', valid_type=Dict, help='diff command-line parameters') For 2. copy the ``cli_options`` snippet at the end of ``calculations.py`` and set the ``cmdline_params`` to: .. code:: python codeinfo.cmdline_params = cli_options(self.inputs.parameters.get_dict()) + [ self.inputs.file1.filename, self.inputs.file2.filename]
That's it. Let's now open the launch.py
script and pass along our command line parameters:
...
builder.parameters = orm.Dict(dict={'ignore-case': True})
...
Change the capitalization of one of the characters in the first line of file1.txt
.
Then, restart the daemon and submit the new calculation:
$ verdi daemon restart
$ verdi run launch.py
If everything worked as intended, the capitalization difference in the first line should be ignored (and thus not show up in the output).
This marks the end of this how-to.
The |CalcJob| and |Parser| plugins are still rather basic and the aiida-diff-tutorial
plugin package is missing a number of useful features, such as package metadata, documentation, tests, CI, etc.
Continue with :ref:`how-to:plugins-develop` in order to learn how to quickly create a feature-rich new plugin package from scratch.