-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PwBaseWorkChain
: revisit restarting from calculations
#691
Comments
Thanks to @MackeEric for first drawing my attention to this! Also pinging @ramirezfranciscof and @qiaojunfeng in case they have comments. |
So, I was a little rushed in running these tests this morning, and messed up the one for @MackeEric: didn't you mention during one of our meetings that QE doesn't properly restart from the charge density / wave functions when |
A caveat to using
In this case, it is interesting to use aiida-quantumespresso/aiida_quantumespresso/workflows/pw/base.py Lines 434 to 441 in 9b083e7
But as far as I can see it doesn't 😅 , which is what is also reported:
So we probably should still distinguish between different restart modes to deal with cases like this. |
So in my experience, I have seen a somehow inconsistent behaviour, at least with QE v6.6. In a test scenario I designed on my own, I also found that However, I have frequently seen cases where QE crashed when I was restarting from a calulation where I had set On the other hand, the very same restarts worked perfectly fine if I only used |
Thanks @MackeEric!
One difference I realised while doing some more testing is that
So, when e.g. restarting from a previous
Right, you weren't just restarting the same run, but adapting inputs. I can imagine using the wave functions from the So, using |
Hi everyone, This days I did find @MackeEric very same problem ("too many bands are not converged"), while trying to "restart" a pw calculation in the hubbard workflow. When you want to start from a density of an other pw calculation, but not use the wfcs (as in, eg, the hubbard workflows), it is then necessary to be able to start the calculation using Could a solution be having a Anyway, I do believe that a solution is needed, especially when I think about the hubbard workflow. |
Thanks for the comment @bastonero! I indeed think a possible path forward could be keeping track of a I have a meeting on Friday at 2pm with @timrov on discussing these restart issues. Maybe you also want to join? |
|
Notes from meeting with @ramirezfranciscof, @timrov, @MackeEric and @bastonero:
Issues to raise on QE GitLab:
Pinging @giovannipizzi for comments, considering his considerable QE experience. 😁 |
Currently the
PwBaseWorkChain
keeps track of a parent calculation to restart from in therestart_calc
variable in the context. For example, when the user provides aparent_folder
input to restart the calculation from, itscreator
is assigned to therestart_calc
context variable here:aiida-quantumespresso/aiida_quantumespresso/workflows/pw/base.py
Lines 245 to 246 in 9b083e7
This context variable is later used to set the
CONTROL.restart_mode
input tag of thepw.x
calculation:aiida-quantumespresso/aiida_quantumespresso/workflows/pw/base.py
Lines 415 to 420 in 9b083e7
However, there is a problem in case the
parent_folder
has nocreator
, for example when the user has stashed some calculation outputs like the charge density from which she/he later wants to restart.I think we should adapt the code so the
creator
is no longer necessary.The
restart_mode
input tagEDIT: I've edited the message below because I had made a mistake during my tests on this issue.
The explanation on the
restart_mode
is a little confusing, and gives the impression that it can only be used to restart from an interrupted run. I've done a bit of testing on a simple SCF calculation on Si to see what happens when you set this to'restart'
, compared to usingstartingpot
orstartingwfc
:restart_mode = 'restart'
: You find the following lines in the output file:The calculation finishes in 1 SCF step, and the total CPU time is 6.09s.
startingpot = 'file'
: You find the following lines in the output file:The calculation finishes in 1 SCF step, and the total CPU time is 11.99s.
startingwfc = 'file'
: You find the following lines in the output file:The calculation finished in 7 SCF steps, and the total CPU time is 17.99s.
startingpot = 'file'
ANDstartingwfc = 'file'
: Same asrestart_mode = 'restart'
So, it seems that:
setting
restart_mode
to'restart'
does restart both interrupted and completed runs properly. In this case both the wave functions and charge density are read from file, which gives a similar result as setting bothstartingpot
and startingwfcto
'file'`.For
startingpot = 'file'
, the calculation only requires 1 SCF step, but it takes longer because the initial wave function coefficients are random. Which is to be expected.For
startingwfc = 'file'
, it seems that the initial potential (or charge density) is still taken from the superposition of atomic charge densities. Because of this, the calculation still needs 7 SCF steps to converge and this restart method takes the longest. I suppose the question here is why the potential is not constructed from the wave functions before starting the SCF.TL;DR
In the end, it seems that using
restart_mode = 'restart'
is indeed the way to go for restartingpw.x
calculations for both interrupted and completed runs.The text was updated successfully, but these errors were encountered: