-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of the BaseRestartWorkChain #79
Initial implementation of the BaseRestartWorkChain #79
Conversation
Hi @chrisjsewell and @sphuber, this is an initial implementation of the |
Codecov Report
@@ Coverage Diff @@
## develop #79 +/- ##
===========================================
- Coverage 88.65% 85.98% -2.68%
===========================================
Files 19 17 -2
Lines 1455 1356 -99
===========================================
- Hits 1290 1166 -124
- Misses 165 190 +25
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JPchico . Gave it a first read and added some comments. Haven't looked at the actual restart logic because I am not too familiar with how this is done in LAMMPS anyway.
aiida_lammps/workflows/base.py
Outdated
"""Define the process specification.""" | ||
# yapf: disable | ||
super().define(spec) | ||
spec.expose_inputs(LammpsBaseCalculation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should definitely use a namespace here, because now you are replacing the metadata
inputs of the workchain with those of the LammpsBaseCalculation
.
spec.expose_inputs(LammpsBaseCalculation) | |
spec.expose_inputs(LammpsBaseCalculation, namespace='lammps') |
aiida_lammps/workflows/base.py
Outdated
# yapf: disable | ||
super().define(spec) | ||
spec.expose_inputs(LammpsBaseCalculation) | ||
spec.input( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you redefining this input? It is already defined on the base class. Same goes for the clean_workdir
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I thought I still needed to make them accessible at his level, at least that is how it was made in the aiida-vasp But maybe it is because that restart workchain was written before the BaseRestartWorkChain
was officially out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aiida-vasp
has indeed been around for a while and it may be doing some things that are no longer optimal. Frankly, for a recent example of a BaseRestartWorkChain
implementation, I would recommend having a look at this one: https://github.com/microsoft/aiida-pyscf/blob/main/src/aiida_pyscf/workflows/base.py
That should follow all latest best practices and should provide a good example I think.
aiida_lammps/workflows/base.py
Outdated
""", | ||
) | ||
spec.input( | ||
"verbosity", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I see this used anywhere. Anyway, I think it might be better to just use the logger with the various levels that already exist. You can always use self.logger.warning
or self.logger.info
to log messages at various levels and then the user can control the level with the global logging.aiida_loglevel
flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I did not know about this, in the aiida-vasp there is this flag that can be used to decide whether some messages should be printed or not via the self.report
.
Does the logger work in a similar way?
I know it does nothing right now but I wanted to start planning for it, but if it can be done via the logger it is much better as it does not generate more nodes that one does not really need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.report
simply forwards to self.logger.report
. It simply logs a message on the REPORT
level, which is a level AiiDA defines in between INFO
and WARNING
. If you want to log verbose information that is not shown by default, I recommend you simply use self.logger.info(message)
. This will not show up by default, but the user can set verdi config set logging.aiida_loglevel INFO
and then they will show. This controls logging for the entire application. Think this is better then each workflow adding custom solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, I was not aware of it being perfectly honest. Will also forward this to aiida-vasp
, I think that this is nice, since if we do not need to store those nodes it is much better.
I was discussing with my colleagues about what to do when a minimization does not reach the tolerances in the maximum number of iterations. And it seems to be that what one does is pretty much the same that if one runs out of walltime. Take the restartfile if it exists and use it as the initial configuration, otherwise use the last step of the trajectory. As this is exactly the same I'll se that one uses the same restart strategy, unless you guys have some idea of what could be done differently. |
Hi @sphuber , I have added a couple of tests based on what is done in the quantum-espresso plugin. I was thinking that maybe one could add a couple of tests for restarting from a stored restarfile, or a file in the remote. The one with the stored restartfile should be possible if one can create a file using the method that can create a file from a string. The file in the remote might be harder, as one would have to create a file in the remote folder and that is not a good idea. |
I have now also added a test that checks that if a restartfile is found that is used when restarting, and another one checking that if no restart is found and the trajectory is stored that is used instead. The only case that is not covered is using the remote, since that one requires a restartfile to be present, however it indirectly checks that it is not supposed to use that option if the file is not found, so that is something i guess. |
ping @sphuber and @chrisjsewell . If there are no objections I'll merge this today. |
aiida_lammps/calculations/base.py
Outdated
@@ -93,39 +93,43 @@ def define(cls, spec): | |||
spec.input( | |||
"metadata.options.input_filename", | |||
valid_type=str, | |||
default=cls._INPUT_FILENAME, | |||
default=cls._DEFAULT_VARIABLES.get("input_filename"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use .get("input_filename")
instead of just ["input_filename"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to be honest, it can easily be changed to just the normal access.
if "store_restart" in self.ctx.inputs: | ||
self.ctx.inputs.settings.store_restart = self.ctx.inputs.store_restart.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This belongs more in the setup
method as it is not really validating anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, will move it there.
aiida_lammps/workflows/base.py
Outdated
if "script" not in self.ctx.inputs and any( | ||
key not in self.ctx.inputs | ||
for key in ["parameters", "potential", "structure"] | ||
): | ||
raise InputValidationError( | ||
"Unless `script` is specified the inputs `structure`, " | ||
"`potential` and `parameters` have to be specified." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation is already done by LammpsBaseCalculation
is it not? In that case you don't have to do it here. If you expose LammpsBaseCalculation
its validators are called when you launch LammpsBaseWorkChain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, okay so when exposing the inputs the validators are called also? Okay that means that this is unnecessary indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. This was necessary, because otherwise the exposing functionality as a way to compose workchains together would be pretty much useless, since having to copy all validation each time would be untenable. Note that there are some tricky caveats. If some process defines a validator that checks validity of values of port a
and b
, and then another process exposes that subprocess but excludes port b
, the validator will no longer work. It is possible to write validators in such a way that they guard against this and won't fail but just skip the validation, but it is not immediately obvious to users.
aiida_lammps/workflows/base.py
Outdated
"`potential` and `parameters` have to be specified." | ||
) | ||
|
||
if "parameters" in self.ctx.inputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of this validation should really be done in an actual validator that is set on the relevant input port. The advantage is that the validation in this way is launched as soon as the process is submitted and if it is invalid, the nodes won't even be created. With the current approach, the process is always submitted and the node is created, and the process starts running, only to except in the first steps.
For example the first one, which is valdiating the parameters
input, should really be done in the LammpsCalculation
class.
class LammpsBaseCalculation(CalcJob)
@classmethod
def define(cls, spec):
super().define(spec)
spec.input('parameter', validator=cls.validate_parameters)
@clasmmethod
def validate_parameters(cls, value, ctx) -> str | None:
# validate here, value contains value passed to the port.
# If there is a problem, return the error message as a string
The ctx
variable will have the Port
or PortNamespace
that is being validated. This is useful if you need to validate a PortNamespace
, for example the top-level inputs. You can use that to access multiple port values in case the validation depends on the value of multiple port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing will move those to the calculation, I was not aware of that difference, that is nice to know.
aiida_lammps/workflows/base.py
Outdated
if calculation.exit_status == 401: | ||
self.report("Energy not converged during minimization, attempting restart") | ||
if calculation.exit_status == 402: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to use LammpsBaseCalculation.exit_codes.ERROR_FORCE_NOT_CONVERGED.status
instead of hard-coding the numbers, just in case they ever were to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing will change it to that.
aiida_lammps/workflows/base.py
Outdated
|
||
restartfiles = [ | ||
entry | ||
for entry in calculation.outputs.remote_folder.listdir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two potential problems with using RemoteData.listdir
as it will open a transport to the remote computer:
- The opening of the transport can fail (for example connection problems) and the exceptions are not caught here, which therefore will except the entire workchain, making this approach very vulnerable to connection problems.
- The opening of the transport is done directly and not through the transport pool that is maintained by the engine for
CalcJobs
. This means that each workchain will open its own connection. If you run many workchains, that means you can hit the remote server too often and get banned.
Unfortunately, there is no API that allows to reuse all the engine's safety mechanisms surrounding transports that is used for CalcJobs
, here in WorkChains
. If possible, I would strongly recommend not using this approach, but if there is no other option, then we should maybe document this so that people are aware of these pitfalls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a tricky one.
If you look in the calculation I do a similar check
# check if there is a parent folder to restart the simulation from a previous run
if "parent_folder" in self.inputs:
# Check if one should do symlinks or if one should copy the files
# By default symlinks are used as the file can be quite large
symlink = settings.pop("parent_folder_symlink", self._default_symlink_usage)
# Find the name of the previous restartfile, if none is given the default one is assumed
# Setting the name here will mean that if the input file is generated from the parameters
# that this name will be used
_read_restart_filename = settings.pop(
"previous_restartfile", self._DEFAULT_VARIABLES.get("restart_filename")
)
if not _read_restart_filename in self.inputs.parent_folder.listdir():
raise exceptions.InputValidationError(
f'The name "{_read_restart_filename}" for the restartfile is not present in the '
f'remote folder "{self.input.parent_folder.uuid}"'
)
I have no idea of how else to check if the file is present in the remote. Because that is basically what this is trying to do, check if the file exists in the parent folder. In the case of the RestartWorkChain
of course it is a bit trickier since we also need to find the name of the restartfile
. Why one needs to do this is due to the fact that you can write the restartfile at regular intervals during your calculation, this will then write files that look like {BASE_RESTARTFILE_NAME}.{step}
. Hence, you have the problem that you do not know exactly how your restartfile will be named, and then you need to check.
The only way I can possibly think of solving this, is to somehow store this information somewhere in the parsing of the calculation, since if you give the instruction that those intermediate restartfiles are printed they are placed in the retrieve_temporary_list
. Then you could find out what is the name of the restartfile, if present, and then add it somewhere in the parsing output. Maybe in the results
dictionary? I think that one should add a section there with WARNINGS
and so on, so maybe there?
What do you think @sphuber ? That would mean that we know the name, we would however not check for its existence. But maybe that is an acceptable compromise
I have removed the calls to I also added the validation as suggested in the actual input ports and added a couple of more tests. |
70130b7
to
b581598
Compare
@sphuber I have rebased this PR to try to make sure that your changes to the base parser are taken into account. I can remove the inputs and parsing of the raw parser that were before the in the base calcjob, and the same for the base workchain. Or I can do it in another PR, not sure what is cleaner. I think I have addressed most of your concerns, as I mentioned the only thing that remains is the checking of the parent folder in the calculation. That can be removed if it is a big problem. But then we will need to make a warning in the documentation explaining the issue. |
Cleanest would be to do that first in a separate PR. And then update this one so it no longer assumes those inputs. But feel free to do whatever you prefer.
I think you can put it in for now, I don't have the time to dive more deeply into how LAMMPS restart functionality works in order to try and come up with another method that doesn't rely on having to use the transport. We can always come back to it later. No need to hold up this PR. |
Can do that, it should not be too much of a hassle.
Yeah the lammps restart is kinda tricky, as it uses a binary file that is generated in one of two ways, either whenever you call the command (usually after your run loop) or at regular intervals during the run loop. This file can then be read in by another simulation and it will take the structure, forces, and other quantities that were being calculated at run time and initialize the simulation with them. As the file can be very very large, we give two choices, either you store it in the database/repo, or you can use the |
If that is really the only thing, then I would indeed recommend that approach. Simply allow passing a |
This we do we symlink the file unless explicitly asked to copy it, basically I looked up how the QE plugin does it and did it in the same way.
Sure I think that lammps itself will write that error in the log will try to produce the error and just add an exit code in the parser then for it. |
Added special entries in the parsing for the minimization stop criterion, energy and force threshold values.
Changing how the reference values are stored to make use of the pytest-regression package.
… This should help for some of the older versions of lammps.
…s arising from listdir. Setting the validation of the parameters in the input ports and moving it out of the input generation. Adding test for the validation of the parameters in the calculation generation.
b581598
to
635df61
Compare
Funny thing, it turns out that errors are not printed to the log, if not to the screen, we are storing that in the file It is a bit annoying, since I'm unsure if one needs to parse both files, for the time being I'll add a minimal parser of the If any of your has an idea on whether then one could consolidate the log parsing to use this file instead that would be great. I honestly have no idea. |
Looking at the plugin, it uses tells AiiDA to redirect all stdout to |
Yes I think that indeed that is the case. I'll double check a bit more and add some tests for the warnings and errors parsing. I think if that is the case then one can then consolidate. I'm just unsure since lammps has so many commands with the output that I'm a bit confused. |
…different depending on the lammps version.
Looking at the LAMMPS documentation I think it is indeed the case that the screen and log files are basically the same The only cases I can see where the screen and log files will be different is that you can in the script decide that you want to close the current log file and open a new one. I can see this being useful for very long simulations. But right now those cases are not even possible. So I think that the best solution is to just retrieve and parse the screen output and ignore the log file. If for some reason someone wants to add the logfile it should be possible to indicate that it should be retrieved when one submits via the A question though, does one need to pass the I'll make a different PR tackling the changes to the parsing. |
…9-sparkles-add-workflows-for-basic-types-of-runs
So I have now fixed this to take into account the parsing from the output file. Removed the |
@sphuber, @chrisjsewell any other comment that should be fixed? Or could I just merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JPchico . Noticed one minor thing, feel free to fix it or not and then merge this.
for index, line in enumerate(data): | ||
line = line.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not doing anything really
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha yes I think that is a remnant from a conflict resolution during a merge. Well spotted. Removing it now
Initial implementation of the
BaseLammpsWorkChain
this is based on theBaseRestartWorkChain
.The idea of this workchain is to handle the most basic calculations and then to be used as a building block to build more specific workchains to handle minimization, md and other more property specific workchains.