Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error on the parallel computing #1413

Closed
caohao001 opened this issue Oct 19, 2022 · 3 comments
Closed

error on the parallel computing #1413

caohao001 opened this issue Oct 19, 2022 · 3 comments
Assignees
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT

Comments

@caohao001
Copy link

Describe the bug

Dear developers,
The error (psi_norm <= 0.0) is occured when I followed the example P009_32H2O_pw(The INPUT file and STRU file are not changed). The 2018_intel_MPI + MKL is used to compile on the Centos system (HPC). The version of EPLA is 2021. The run command is: mpirun -np 12 abacus > log. The log file is presented below.
Thanks.

ITER ETOT(eV) EDIFF(eV) DRHO TIME(s)
m = 3
j = 0 lagrange norm = 4.62493e+21
j = 1 lagrange norm = 7.66347e+20
j = 2 lagrange norm = 5.43346e+16
j = 3 lagrange norm = 8.86966e+38
in DiagoCG, psi norm = -5.36155e+21
If you use GNU compiler, it may due to the zdotc is unavailable.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NOTICE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

psi_norm <= 0.0
CHECK IN FILE : OUT.autotest/warning.log

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NOTICE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

@caohao001 caohao001 added the Bugs Bugs that only solvable with sufficient knowledge of DFT label Oct 19, 2022
@Liu-RX
Copy link
Collaborator

Liu-RX commented Oct 20, 2022

I ran the case on Ubuntu 20.04 with ABACUS 3.0.0 compiled with intel-oneAPI 2021.3.0. The elpa version is also 2021. I tested serial run and mpirun with 2 and 4 cores, and did not came into the bug. It might be a bug with different version of compiler or operating system?

@caic99
Copy link
Member

caic99 commented Nov 29, 2022

I can repeat this bug on 103_PW_15_CF_CS_S1_smallg, using 4 OpenMP threads. ABACUS is compiled with gcc with -O0 flag.

@pxlxingliang
Copy link
Collaborator

By using the latest develop codes, I can not repeat the bug with ABACUS applied GNU or intel docker images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugs Bugs that only solvable with sufficient knowledge of DFT
Projects
None yet
Development

No branches or pull requests

5 participants