-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple calculation (Er diamond) crashing #355
Comments
As a follow-up, if I run in a single pool, the error is printed in the stdout, and indeed the parser detects it and sets the exit code of the PwCalculation to 463 with a message
(and it's now running, I think it's almost finished). So I think that indeed we need to also check the CRASH file if available (retrieve it and add to the parsing), when multiple pools exist and the error might not be printed by the main pool with IO. This is the part of the code where the CRASH file is written, for reference. I think multiple errors could appear in there, so we should just use the same logic as used now in the main file (or even parse also that file at the end so the same logic is then triggered). @mbercx what do you think? |
fixed in aiidateam/aiida-quantumespresso#890. @mbercx which version is the aiida-quantumespresso include the fix? |
Still have to make a release! Just wanted to still sneak in aiidateam/aiida-quantumespresso#902 since this would also remove the warnings for the |
Thanks @mbercx. I want to have a look at aiidateam/aiida-quantumespresso#902, will do it by tomorrow. |
Hi @mbercx, can you check if this issue is fixed? We use |
Ran with 2 CPUs/pools, error was caught properly and calculation restarted with different diagonalisation approach as expected. So I think this is fixed! |
Describe the bug
When running an erbium diamond structure, the NSCF crashes.
To Reproduce
Run this input file: Er-Diamond.xsf.txt, renaming it from .txt to .xsf, with "structure as is", "metal", "non-magnetic", bands + PDOS, moderate protocol. I run with 2 MPI processes and 2 pools.
Expected behavior
I get the bands and PDOS :-)
Screenshots
Top part of my screenshot
roblem
Version (if known)
App version: v23.02.0
Additional context
The file crashed while/after computing the 2nd point, the last lines in the aiida.out file are:
and the stderr shows:
If I go with the terminal, the CRASH file says
probably this is the actual cause of the error and this is not shown because it happens on the second pool that cannot print.
@mbercx are we also parsing this file? Probably not? If it were parsed (or if we run with 1 only pool) would the workflow be able to recover? Suggestions on how to fix this?
If useful, here is the workflow report:
The text was updated successfully, but these errors were encountered: