Initial implementation of the BaseRestartWorkChain #79

JPchico · 2023-05-25T14:07:50Z

Initial implementation of the BaseLammpsWorkChain this is based on the BaseRestartWorkChain.

The idea of this workchain is to handle the most basic calculations and then to be used as a building block to build more specific workchains to handle minimization, md and other more property specific workchains.

JPchico · 2023-05-25T14:10:11Z

Hi @chrisjsewell and @sphuber, this is an initial implementation of the BaseRestartWorkChain for a LammpsBaseCalculation the tests and the docs are missing. It is just a basic implemetation to deal with very basic restarts for now. But we can use it as a place to discuss what needs to be added and/or fixed.

codecov · 2023-05-25T14:17:25Z

Codecov Report

Merging #79 (85c2b62) into develop (22cf69c) will decrease coverage by 2.68%.
The diff coverage is 82.58%.

@@             Coverage Diff             @@
##           develop      #79      +/-   ##
===========================================
- Coverage    88.65%   85.98%   -2.68%     
===========================================
  Files           19       17       -2     
  Lines         1455     1356      -99     
===========================================
- Hits          1290     1166     -124     
- Misses         165      190      +25

Flag	Coverage Δ
pytests	`85.98% <82.58%> (-2.68%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiida_lammps/parsers/inputfile.py	`87.07% <75.00%> (-0.16%)`	⬇️
aiida_lammps/workflows/base.py	`75.24% <75.24%> (ø)`
aiida_lammps/parsers/base.py	`85.00% <83.67%> (-0.23%)`	⬇️
aiida_lammps/calculations/base.py	`96.62% <96.66%> (+0.35%)`	⬆️
aiida_lammps/parsers/parse_raw/lammps_output.py	`90.66% <100.00%> (+1.60%)`	⬆️
aiida_lammps/utils.py	`83.33% <100.00%> (+83.33%)`	⬆️

... and 1 file with indirect coverage changes

sphuber

Thanks @JPchico . Gave it a first read and added some comments. Haven't looked at the actual restart logic because I am not too familiar with how this is done in LAMMPS anyway.

sphuber · 2023-05-25T21:00:35Z

aiida_lammps/workflows/base.py

+        """Define the process specification."""
+        # yapf: disable
+        super().define(spec)
+        spec.expose_inputs(LammpsBaseCalculation)


Should definitely use a namespace here, because now you are replacing the metadata inputs of the workchain with those of the LammpsBaseCalculation.

Suggested change

spec.expose_inputs(LammpsBaseCalculation)

spec.expose_inputs(LammpsBaseCalculation, namespace='lammps')

sphuber · 2023-05-25T21:01:27Z

aiida_lammps/workflows/base.py

+        # yapf: disable
+        super().define(spec)
+        spec.expose_inputs(LammpsBaseCalculation)
+        spec.input(


Why are you redefining this input? It is already defined on the base class. Same goes for the clean_workdir.

Good point, I thought I still needed to make them accessible at his level, at least that is how it was made in the aiida-vasp But maybe it is because that restart workchain was written before the BaseRestartWorkChain was officially out.

The aiida-vasp has indeed been around for a while and it may be doing some things that are no longer optimal. Frankly, for a recent example of a BaseRestartWorkChain implementation, I would recommend having a look at this one: https://github.com/microsoft/aiida-pyscf/blob/main/src/aiida_pyscf/workflows/base.py

That should follow all latest best practices and should provide a good example I think.

sphuber · 2023-05-25T21:09:21Z

aiida_lammps/workflows/base.py

+            """,
+        )
+        spec.input(
+            "verbosity",


I don't think I see this used anywhere. Anyway, I think it might be better to just use the logger with the various levels that already exist. You can always use self.logger.warning or self.logger.info to log messages at various levels and then the user can control the level with the global logging.aiida_loglevel flags.

Nice, I did not know about this, in the aiida-vasp there is this flag that can be used to decide whether some messages should be printed or not via the self.report.

Does the logger work in a similar way?

I know it does nothing right now but I wanted to start planning for it, but if it can be done via the logger it is much better as it does not generate more nodes that one does not really need

self.report simply forwards to self.logger.report. It simply logs a message on the REPORT level, which is a level AiiDA defines in between INFO and WARNING. If you want to log verbose information that is not shown by default, I recommend you simply use self.logger.info(message). This will not show up by default, but the user can set verdi config set logging.aiida_loglevel INFO and then they will show. This controls logging for the entire application. Think this is better then each workflow adding custom solutions.

Fantastic, I was not aware of it being perfectly honest. Will also forward this to aiida-vasp, I think that this is nice, since if we do not need to store those nodes it is much better.

JPchico · 2023-05-26T12:47:52Z

I was discussing with my colleagues about what to do when a minimization does not reach the tolerances in the maximum number of iterations. And it seems to be that what one does is pretty much the same that if one runs out of walltime. Take the restartfile if it exists and use it as the initial configuration, otherwise use the last step of the trajectory.

As this is exactly the same I'll se that one uses the same restart strategy, unless you guys have some idea of what could be done differently.

JPchico · 2023-05-31T14:28:01Z

Hi @sphuber , I have added a couple of tests based on what is done in the quantum-espresso plugin. I was thinking that maybe one could add a couple of tests for restarting from a stored restarfile, or a file in the remote.

The one with the stored restartfile should be possible if one can create a file using the method that can create a file from a string.

The file in the remote might be harder, as one would have to create a file in the remote folder and that is not a good idea.

JPchico · 2023-06-01T14:51:03Z

I have now also added a test that checks that if a restartfile is found that is used when restarting, and another one checking that if no restart is found and the trajectory is stored that is used instead.

The only case that is not covered is using the remote, since that one requires a restartfile to be present, however it indirectly checks that it is not supposed to use that option if the file is not found, so that is something i guess.

JPchico · 2023-06-05T05:33:30Z

ping @sphuber and @chrisjsewell . If there are no objections I'll merge this today.

sphuber · 2023-06-05T07:06:31Z

aiida_lammps/calculations/base.py

@@ -93,39 +93,43 @@ def define(cls, spec):
        spec.input(
            "metadata.options.input_filename",
            valid_type=str,
-            default=cls._INPUT_FILENAME,
+            default=cls._DEFAULT_VARIABLES.get("input_filename"),


why use .get("input_filename") instead of just ["input_filename"]?

No reason to be honest, it can easily be changed to just the normal access.

sphuber · 2023-06-05T07:10:48Z

aiida_lammps/workflows/base.py

+        if "store_restart" in self.ctx.inputs:
+            self.ctx.inputs.settings.store_restart = self.ctx.inputs.store_restart.value


This belongs more in the setup method as it is not really validating anything

Sure thing, will move it there.

sphuber · 2023-06-05T07:11:34Z

aiida_lammps/workflows/base.py

+        if "script" not in self.ctx.inputs and any(
+            key not in self.ctx.inputs
+            for key in ["parameters", "potential", "structure"]
+        ):
+            raise InputValidationError(
+                "Unless `script` is specified the inputs `structure`, "
+                "`potential` and  `parameters` have to be specified."
+            )


This validation is already done by LammpsBaseCalculation is it not? In that case you don't have to do it here. If you expose LammpsBaseCalculation its validators are called when you launch LammpsBaseWorkChain

Aha, okay so when exposing the inputs the validators are called also? Okay that means that this is unnecessary indeed.

Exactly. This was necessary, because otherwise the exposing functionality as a way to compose workchains together would be pretty much useless, since having to copy all validation each time would be untenable. Note that there are some tricky caveats. If some process defines a validator that checks validity of values of port a and b, and then another process exposes that subprocess but excludes port b, the validator will no longer work. It is possible to write validators in such a way that they guard against this and won't fail but just skip the validation, but it is not immediately obvious to users.

sphuber · 2023-06-05T07:18:09Z

aiida_lammps/workflows/base.py

+                "`potential` and  `parameters` have to be specified."
+            )
+
+        if "parameters" in self.ctx.inputs:


The rest of this validation should really be done in an actual validator that is set on the relevant input port. The advantage is that the validation in this way is launched as soon as the process is submitted and if it is invalid, the nodes won't even be created. With the current approach, the process is always submitted and the node is created, and the process starts running, only to except in the first steps.

For example the first one, which is valdiating the parameters input, should really be done in the LammpsCalculation class.

class LammpsBaseCalculation(CalcJob) @classmethod def define(cls, spec): super().define(spec) spec.input('parameter', validator=cls.validate_parameters) @clasmmethod def validate_parameters(cls, value, ctx) -> str | None: # validate here, value contains value passed to the port. # If there is a problem, return the error message as a string

The ctx variable will have the Port or PortNamespace that is being validated. This is useful if you need to validate a PortNamespace, for example the top-level inputs. You can use that to access multiple port values in case the validation depends on the value of multiple port.

Sure thing will move those to the calculation, I was not aware of that difference, that is nice to know.

sphuber · 2023-06-05T07:22:13Z

aiida_lammps/workflows/base.py

+        if calculation.exit_status == 401:
+            self.report("Energy not converged during minimization, attempting restart")
+        if calculation.exit_status == 402:


Probably best to use LammpsBaseCalculation.exit_codes.ERROR_FORCE_NOT_CONVERGED.status instead of hard-coding the numbers, just in case they ever were to change

Sure thing will change it to that.

sphuber · 2023-06-05T07:27:20Z

aiida_lammps/workflows/base.py

+
+        restartfiles = [
+            entry
+            for entry in calculation.outputs.remote_folder.listdir()


There are two potential problems with using RemoteData.listdir as it will open a transport to the remote computer:

The opening of the transport can fail (for example connection problems) and the exceptions are not caught here, which therefore will except the entire workchain, making this approach very vulnerable to connection problems.

The opening of the transport is done directly and not through the transport pool that is maintained by the engine for CalcJobs. This means that each workchain will open its own connection. If you run many workchains, that means you can hit the remote server too often and get banned.

Unfortunately, there is no API that allows to reuse all the engine's safety mechanisms surrounding transports that is used for CalcJobs, here in WorkChains. If possible, I would strongly recommend not using this approach, but if there is no other option, then we should maybe document this so that people are aware of these pitfalls.

Yes, this is a tricky one.
If you look in the calculation I do a similar check

# check if there is a parent folder to restart the simulation from a previous run if "parent_folder" in self.inputs: # Check if one should do symlinks or if one should copy the files # By default symlinks are used as the file can be quite large symlink = settings.pop("parent_folder_symlink", self._default_symlink_usage) # Find the name of the previous restartfile, if none is given the default one is assumed # Setting the name here will mean that if the input file is generated from the parameters # that this name will be used _read_restart_filename = settings.pop( "previous_restartfile", self._DEFAULT_VARIABLES.get("restart_filename") ) if not _read_restart_filename in self.inputs.parent_folder.listdir(): raise exceptions.InputValidationError( f'The name "{_read_restart_filename}" for the restartfile is not present in the ' f'remote folder "{self.input.parent_folder.uuid}"' )

I have no idea of how else to check if the file is present in the remote. Because that is basically what this is trying to do, check if the file exists in the parent folder. In the case of the RestartWorkChain of course it is a bit trickier since we also need to find the name of the restartfile. Why one needs to do this is due to the fact that you can write the restartfile at regular intervals during your calculation, this will then write files that look like {BASE_RESTARTFILE_NAME}.{step}. Hence, you have the problem that you do not know exactly how your restartfile will be named, and then you need to check.

The only way I can possibly think of solving this, is to somehow store this information somewhere in the parsing of the calculation, since if you give the instruction that those intermediate restartfiles are printed they are placed in the retrieve_temporary_list. Then you could find out what is the name of the restartfile, if present, and then add it somewhere in the parsing output. Maybe in the results dictionary? I think that one should add a section there with WARNINGS and so on, so maybe there?

What do you think @sphuber ? That would mean that we know the name, we would however not check for its existence. But maybe that is an acceptable compromise

JPchico · 2023-06-05T13:02:15Z

I have removed the calls to listdir in the workchain, I only keep one in the CalcJob which checks if the file actually exists. Otherwise it parses the filename of the restart and adds it in the results, perhaps that is sufficient. Would that be okay?

I also added the validation as suggested in the actual input ports and added a couple of more tests.

JPchico · 2023-06-07T06:22:56Z

@sphuber I have rebased this PR to try to make sure that your changes to the base parser are taken into account.

I can remove the inputs and parsing of the raw parser that were before the in the base calcjob, and the same for the base workchain. Or I can do it in another PR, not sure what is cleaner.

I think I have addressed most of your concerns, as I mentioned the only thing that remains is the checking of the parent folder in the calculation. That can be removed if it is a big problem. But then we will need to make a warning in the documentation explaining the issue.

sphuber · 2023-06-07T07:28:06Z

Or I can do it in another PR, not sure what is cleaner.

Cleanest would be to do that first in a separate PR. And then update this one so it no longer assumes those inputs. But feel free to do whatever you prefer.

I think I have addressed most of your concerns, as I mentioned the only thing that remains is the checking of the parent folder in the calculation. That can be removed if it is a big problem. But then we will need to make a warning in the documentation explaining the issue.

I think you can put it in for now, I don't have the time to dive more deeply into how LAMMPS restart functionality works in order to try and come up with another method that doesn't rely on having to use the transport. We can always come back to it later. No need to hold up this PR.

JPchico · 2023-06-07T07:41:41Z

Cleanest would be to do that first in a separate PR. And then update this one so it no longer assumes those inputs. But feel free to do whatever you prefer.

Can do that, it should not be too much of a hassle.

I think you can put it in for now, I don't have the time to dive more deeply into how LAMMPS restart functionality works in order to try and come up with another method that doesn't rely on having to use the transport. We can always come back to it later. No need to hold up this PR.

Yeah the lammps restart is kinda tricky, as it uses a binary file that is generated in one of two ways, either whenever you call the command (usually after your run loop) or at regular intervals during the run loop.

This file can then be read in by another simulation and it will take the structure, forces, and other quantities that were being calculated at run time and initialize the simulation with them.

As the file can be very very large, we give two choices, either you store it in the database/repo, or you can use the RemoteData from a previous calculation. Checking for the first case is pretty simple, it is just a SinglefileData that is passed as an input (we do not do any checking on the validity of this file since it is a binary file that depends on the version of lammps, compiler, etc). With the remote the only thing that is done now is to check if the expected file exists in the remote and that is it. Maybe it is overkill and it should be the users responsibility to make sure that the file exists in the node that they are passing.

sphuber · 2023-06-07T07:46:21Z

With the remote the only thing that is done now is to check if the expected file exists in the remote and that is it. Maybe it is overkill and it should be the users responsibility to make sure that the file exists in the node that they are passing.

If that is really the only thing, then I would indeed recommend that approach. Simply allow passing a RemoteData as parent calculation to LammpsBaseCalculation, have that symlink the restart file (which will be ignored by the engine logging a warning if the file doesn't exist) and let the calculation fail. The parser should then simply detect this specific error and return an exit code indicating that the remote didn't contain the expected file. I think this is fair enough and gets rid of the problems with having to open transports which is really not ideal

JPchico · 2023-06-07T07:51:43Z

If that is really the only thing, then I would indeed recommend that approach. Simply allow passing a RemoteData as parent calculation to LammpsBaseCalculation, have that symlink the restart file (which will be ignored by the engine logging a warning if the file doesn't exist) and let the calculation fail.

This we do we symlink the file unless explicitly asked to copy it, basically I looked up how the QE plugin does it and did it in the same way.

The parser should then simply detect this specific error and return an exit code indicating that the remote didn't contain the expected file. I think this is fair enough and gets rid of the problems with having to open transports which is really not ideal

Sure I think that lammps itself will write that error in the log will try to produce the error and just add an exit code in the parser then for it.

…mmps.

Added special entries in the parsing for the minimization stop criterion, energy and force threshold values.

Changing how the reference values are stored to make use of the pytest-regression package.

… This should help for some of the older versions of lammps.

…s arising from listdir. Setting the validation of the parameters in the input ports and moving it out of the input generation. Adding test for the validation of the parameters in the calculation generation.

…ote folder name

JPchico · 2023-06-07T13:48:08Z

Funny thing, it turns out that errors are not printed to the log, if not to the screen, we are storing that in the file lammps_output. It is basically the same than the log but with the errors and so on.

It is a bit annoying, since I'm unsure if one needs to parse both files, for the time being I'll add a minimal parser of the lammps_output that checks for errors and warnings.

If any of your has an idea on whether then one could consolidate the log parsing to use this file instead that would be great. I honestly have no idea.

sphuber · 2023-06-07T14:20:48Z

Looking at the plugin, it uses tells AiiDA to redirect all stdout to lammps_output and then adds manually the -log log.lammps. If you are saying that the contents of lammps_output is simply content of log.lammps + errors, then I think we can simply get rid of the log.lammps file and rely only on stdout. Now we are retrieving the same content twice. You can then have a single parser that just parses lammps_output (which btw I would just keep as aiida.out which is the default and used across other plugins) for relevant data and errors.

JPchico · 2023-06-07T14:26:29Z

Looking at the plugin, it uses tells AiiDA to redirect all stdout to lammps_output and then adds manually the -log log.lammps. If you are saying that the contents of lammps_output is simply content of log.lammps + errors, then I think we can simply get rid of the log.lammps file and rely only on stdout. Now we are retrieving the same content twice. You can then have a single parser that just parses lammps_output (which btw I would just keep as aiida.out which is the default and used across other plugins) for relevant data and errors.

Yes I think that indeed that is the case. I'll double check a bit more and add some tests for the warnings and errors parsing. I think if that is the case then one can then consolidate. I'm just unsure since lammps has so many commands with the output that I'm a bit confused.

…different depending on the lammps version.

JPchico · 2023-06-08T06:15:13Z

Looking at the LAMMPS documentation I think it is indeed the case that the screen and log files are basically the same
https://docs.lammps.org/Run_output.html

The only cases I can see where the screen and log files will be different is that you can in the script decide that you want to close the current log file and open a new one. I can see this being useful for very long simulations. But right now those cases are not even possible. So I think that the best solution is to just retrieve and parse the screen output and ignore the log file.

If for some reason someone wants to add the logfile it should be possible to indicate that it should be retrieved when one submits via the additional_retrieve_list.

A question though, does one need to pass the additional_retrieve_list inside the CalcJob to tell the engine to actually retrieve it? Or is it done automatically if the option is passed?

I'll make a different PR tackling the changes to the parsing.

…flicts in merge.

…9-sparkles-add-workflows-for-basic-types-of-runs

JPchico · 2023-06-09T15:14:27Z

So I have now fixed this to take into account the parsing from the output file. Removed the listdir checks and if a lammps error is detected the parser will except with an exit code referring to which error is raised by lammps.

JPchico · 2023-06-11T12:18:53Z

@sphuber, @chrisjsewell any other comment that should be fixed? Or could I just merge this?

sphuber

Thanks @JPchico . Noticed one minor thing, feel free to fix it or not and then merge this.

sphuber · 2023-06-11T16:47:19Z

aiida_lammps/parsers/parse_raw/lammps_output.py

+    for index, line in enumerate(data):
+        line = line.strip()


This is not doing anything really

Aha yes I think that is a remnant from a conflict resolution during a merge. Well spotted. Removing it now

JPchico added the enhancement label May 25, 2023

JPchico added this to the Refactoring and generalization milestone May 25, 2023

JPchico requested review from chrisjsewell and sphuber May 25, 2023 14:07

JPchico self-assigned this May 25, 2023

sphuber reviewed May 25, 2023

View reviewed changes

JPchico marked this pull request as ready for review May 31, 2023 16:12

JPchico requested a review from sphuber June 1, 2023 18:23

sphuber reviewed Jun 5, 2023

View reviewed changes

JPchico requested a review from sphuber June 7, 2023 05:32

JPchico force-pushed the 39-sparkles-add-workflows-for-basic-types-of-runs branch from 70130b7 to b581598 Compare June 7, 2023 06:10

Jonathan Chico added 7 commits June 7, 2023 14:49

(WIP) Initial implementation of the BaseRestartWorkChain for aiida-la…

52d9c17

…mmps.

Adding process_handler for when the minimization does not converge,

e162f87

Added special entries in the parsing for the minimization stop criterion, energy and force threshold values.

(WIP) Adding tests for the workflow.

f1b14e4

Changing how the reference values are stored to make use of the pytest-regression package.

Adding recursive rounding for the checking of the calculations tests.…

3725a8a

… This should help for some of the older versions of lammps.

Adding tests for the out of walltime exit code

20dfbad

Adding test for when the minimization does not converge

ec9b7dd

Adding tests to when one restarts from the trajectory

c8db39d

Jonathan Chico added 3 commits June 7, 2023 15:09

Removing the checking of the restartfile in the remote to avoid issue…

dd074a1

…s arising from listdir. Setting the validation of the parameters in the input ports and moving it out of the input generation. Adding test for the validation of the parameters in the calculation generation.

Fixing python<3.10 incompatibility issue

6e82b0f

Fixing an issue in which the restartfilename might had the entire rem…

2c1d7c2

…ote folder name

JPchico force-pushed the 39-sparkles-add-workflows-for-basic-types-of-runs branch from b581598 to 635df61 Compare June 7, 2023 13:09

Fixing coflicts in the raw test and the rest of the tests.

635df61

Adding a parser for the output file to check for warnings and errors.

aa6609b

Fixing an issue in which the line of code that produces a warning is …

221cb78

…different depending on the lammps version.

Jonathan Chico added 4 commits June 8, 2023 08:20

Fixing some duplicate lines in the calculation, remmants from old con…

e551e7f

…flicts in merge.

Merge branch 'develop' of github.com:aiidaplugins/aiida-lammps into 3…

2c2879f

…9-sparkles-add-workflows-for-basic-types-of-runs

Fixing small formatting issues

2e04596

Fixing some more formatting errors

e09178a

sphuber approved these changes Jun 11, 2023

View reviewed changes

Removing unnecessary lines due to conflict merge.

85c2b62

JPchico merged commit 5f56b29 into aiidaplugins:develop Jun 12, 2023

JPchico deleted the 39-sparkles-add-workflows-for-basic-types-of-runs branch June 12, 2023 05:44

JPchico mentioned this pull request Jun 12, 2023

✨ Add workflows for basic types of runs #39

Closed

	spec.expose_inputs(LammpsBaseCalculation)
	spec.expose_inputs(LammpsBaseCalculation, namespace='lammps')

		if "store_restart" in self.ctx.inputs:
		self.ctx.inputs.settings.store_restart = self.ctx.inputs.store_restart.value

Initial implementation of the BaseRestartWorkChain #79

Initial implementation of the BaseRestartWorkChain #79

Conversation

JPchico commented May 25, 2023

JPchico commented May 25, 2023

codecov bot commented May 25, 2023 • edited Loading

Codecov Report

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JPchico commented May 26, 2023

JPchico commented May 31, 2023

JPchico commented Jun 1, 2023

JPchico commented Jun 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sphuber Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JPchico Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

JPchico commented Jun 5, 2023

JPchico commented Jun 7, 2023

sphuber commented Jun 7, 2023

JPchico commented Jun 7, 2023

sphuber commented Jun 7, 2023

JPchico commented Jun 7, 2023 • edited Loading

JPchico commented Jun 7, 2023

sphuber commented Jun 7, 2023

JPchico commented Jun 7, 2023

JPchico commented Jun 8, 2023

JPchico commented Jun 9, 2023

JPchico commented Jun 11, 2023

sphuber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 25, 2023 •

edited

Loading

sphuber Jun 5, 2023 •

edited

Loading

JPchico Jun 5, 2023 •

edited

Loading

JPchico commented Jun 7, 2023 •

edited

Loading