Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two bugs in pw90common_fourier_R_to_k* #273

Merged
merged 12 commits into from
Jul 24, 2019

Conversation

qiaojunfeng
Copy link
Collaborator

@qiaojunfeng qiaojunfeng commented May 28, 2019

There are 2 major bugs in routines pw90common_fourier_R_to_k*

  1. The loop index i, j on wdist_ndeg is wrong, and this will cause wrong interpolation results of all the calculations. In some bandstructure calculations, the Wannier interpolated bands significantly deviate from ab initio bands. Applying fixes of this commit can correctly interpolate bandstructure.

  2. The repeated calculations of rdotk and phase_fac can lead to wrong results, as pointed out in issue 271.

Fixes #271.

@giovannipizzi
Copy link
Member

Thanks! Indeed the loops seem wrong. Do you have an example where the interpolation is wrong because of this bug and gets much better after the fix?

@qiaojunfeng
Copy link
Collaborator Author

qiaojunfeng commented Jul 4, 2019

The attachment is a zip file including the input and output files of QE & wannier90.
input_output.zip

The structure of the zip is:

./input_output
├── bands.in
├── bands.x.in
├── Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF
├── figure
│   ├── new
│   │   ├── mbt-bands.dat
│   │   ├── mbt-bands.gnu
│   │   ├── mbt-bands.pdf
│   │   ├── mbt-bands.py
│   │   └── mbt-path.kpt
│   ├── old
│   │   ├── mbt-bands.dat
│   │   ├── mbt-bands.gnu
│   │   ├── mbt-bands.pdf
│   │   └── mbt-bands.py
│   └── qeband.pdf
├── mbt.pw2wan
├── mbt.win
├── mbt.wout
├── Mn.rel-pbe-spn-kjpaw_psl.0.3.1.UPF
├── nscf.in
├── scf.in
└── Te.rel-pbe-dn-kjpaw_psl.0.3.1.UPF

To reduce computational time, I have decreased energy cutoff, kmesh in QE run, and the number of iterations in wannierisation step, so the spreads of MLWF are not very good, around 3.8, as shown in ./input_output/mbt.wout.

The file ./input_output/figure/qeband.pdf is the ab initio band produced by plotband.x of QE.
The file ./input_output/figure/old/mbt-bands.pdf is the Wannier interpolation band produced by the develop branch of postw90.x.
The file ./input_output/figure/new/mbt-bands.pdf is the Wannier interpolation band produced by my modified postw90.x.

Since the chk, amn, mmn, eig files are too large, I haven't included them in the zip.

For your convenience, the figures are pasted here:
qeband
old
new

Note added: Actually, these weird interpolations were identified by my colleague, Jiaqi Zhou. After carefully digging into the source code I found it was these tiny mistakes that resulted in the wrong interpolation. Moreover, if the spreads of MLWF are larger, the interpolation is much worse, totally different from the ab initio results.

@stepan-tsirkin
Copy link
Collaborator

Dear Junfeng, thank you for pointing to this bug.

I also suspect there is another bug with ws_distance in postw90_common.F90

line 732 ( subroutine pw90common_fourier_R_to_k )
reads:
OO(i, j) = OO(i, j) + cmplx_i*crdist_ws(alpha, ideg, i, j, ir)phase_facOO_R(i, j, ir)

which I believe is correct. However in the other versions of this subroutine the array crvec is used instead of crdist_ws, also when use_ws_distance=True

see lines 732, 820-825, 881, 888, 995, 1093-1101, 1182, 1318

e.g line 820:
if (present(OO_dx)) OO_dx(:, :) = OO_dx(:, :) + &
cmplx_i*crvec(1, ir)phase_facOO_R(:, :, ir)

Am I right, that those lines also need to be corrected (crvec-> crdist_ws) ?

This does not change the interpolated energies, but plays a role when interpolating velocities, Berry curvatures, etc

@gibertini
Copy link

Dear Junfeng,

thanks for spotting this. I also totally agree that both 1. and 2. are bugs and I believe that your commits are the right solution.

For 1. indeed the indices i and j of wdist_ndeg are not used consistently within the same loops. So for sure one needs to make a consistent choice. Among the two possibilities (i,j or j,i), I agree that the one you chose is the correct one.

For 2. I think the re-definition of rdotk and phase_fac is the result of an unfortunate cut-and-paste from the old version that was not commented out. Indeed, whenever rdotk and phase_fac are defined in the new way, the old version is kept right below and commented. I think it would be simply better to remove all such commented lines, together with the ones erroneously uncommented that you spotted.

@gibertini
Copy link

Dear Stepan,

It think you are right. There was initially an issue with the interpolation of band velocity that was solved in #53 .
Unfortunately the solution was not extended to other interpolation routines.

Thanks you for spotting this!

@stepan-tsirkin
Copy link
Collaborator

Good, How do we proceed?
Should I open a separate pull request to implement what I noted in the previous comment?
Or just the current PR may be updated.?

@qiaojunfeng
Copy link
Collaborator Author

Dear Marco Gibertini,

Thanks! I have removed those comments.

@qiaojunfeng
Copy link
Collaborator Author

Dear Stepan S. Tsirkin,

Thanks! I think it should be crdist_ws too. If you like, I can help fix the lines you mentioned, or you fix the bugs by creating a new pull request, which do you prefer?

@stepan-tsirkin
Copy link
Collaborator

Dear Junfeng,
It would be nice if you fix those lines, and I double-check afterwards.
It is a related bug, so it does not deserve a separate PR, I think.

@qiaojunfeng
Copy link
Collaborator Author

Dear Stepan,

I have corrected the crdist_ws bugs, could you please double-check my modifications?
I split some long lines to ensure their length < 132, otherwise, the compiler will complain:

../postw90/postw90_common.F90:1001:132:

 1001 |                                         (crdist_ws(a, ideg, i, j, ir) + local_wannier_centres(a, j) - local_wannier_centres(a, i))* &
      |                                                                                                                                    1
Error: Line truncated at (1) [-Werror=line-truncation]

@stepan-tsirkin
Copy link
Collaborator

Hi Jungeng,

Thank you, the correction look fine.

Just now after a careful look, in lines 1302
and 1312 it seems to me that irvec also should be replaced to irdist. But to be sure it is better to consult the contributors that implemented that ..._TB_conv subroutine

@giovannipizzi
Copy link
Member

If I am not mistaken, those lines are introduced here by @Julen-ia but they have comments with [lp] who I guess is @paulatz (maybe copy-paste from elsewhere?).

Could @Julen-ia and @paulatz comment on these points/possible bugs? (We are having a coding day on the 24th and if we figure out everything before then we can review and merge). Thanks!

@Julen-ia
Copy link
Contributor

Dear all,

Thank you for spotting the bugs, and also these two lines that I introduced and need to be changed. I think that the safest solution is to specify that both irdist_ws and irvec need to be zero in the if statement Stepan spotted.

In principle, the if statement should only apply to one R vector, namely \vec{R}=0 (the origin). In ws_distance.f90, line 136, I see

irdist_ws(:, ideg, iw, jw, ir) = irvec(:, ir) + shifts(:, ideg)

if we switch irvec --> irdist_ws in the if statement Stepan spotted, irdist_ws = 0 could be triggered even if irvec(:,ir) \neq 0 provided irvec = - shifts , and this is not what we want. I dont have much experience with irdist_ws, so what I just said may never take place. Unfortunately I am on travel at the moment and I have limited availability for doing numerical checks.

Any feedback is appreciated, thank you all

Best regards

Julen

@paulatz
Copy link
Contributor

paulatz commented Jul 18, 2019

It's been a while, and I'm not sure I remember everything correctly. From what I see, the "old" subroutine pw90common_fourier_R_to_k should be correct, except for the exchange of i with j in wdist_ndeg. I think this is not really a bug since wdist_ndeg(i,j,ir) is symmetric in i,j (I cannot think of any case where it could be non-symmetric, but I may be wrong).

On the other hand, the "new" subroutine pw90common_fourier_R_to_k_new was actually still using the "old" interpolation, without using the minimum distance. All the changes look correct to me and should fix the problem.

In pw90common_fourier_R_to_k_vec (line ~1087) the re-definition of the phase factor also looks quite clearly like a bug to me and I think the new code fixes it.

cheers

@qiaojunfeng
Copy link
Collaborator Author

Dear Julen,

Thanks for the response!
I spent some time reading your PRB paper, and I will try to convey my thoughts about this if statement.

From Eq.(21) in your paper, the TB convention is

In the Wigner Seitz interpolation, we replace the ordinary interpolation

by

i.e. the original R is replaced by R+T to ensure minimal distance between Wannier centers r_m and r_n + R + T, where T is the supercell translations. (These equations can be found in the v3 review paper Eq.(47,48) ).

Since R+T is identical to R, the TB convention should also hold for any 0 + T, not only R = 0.
For ir in the range of 1 to nrpts, there exists exactly 1 ir which satisfy irvec(:,ir) = 0, that is this ir corresponds to R = 0. For this ir, the TB convention is satisfied, and its translations R + T should also satisfy TB convention, i.e. all the irdist_ws(:, ideg, i, i, ir) with ideg ranges from 1 to wdist_ndeg(i, i, ir). So, if we specify both irdist_ws and irvec be 0, then we may lose some irdist_ws which are the degenerate of R = 0.
In reverse, is it possible that for some ir, irvec(:,ir) != 0 but irdist_ws(:,ideg,i,i,ir) = 0 + T? The answer is no, because irdis_ws is the supercell translations of irvec. The condition irdist_ws(:,ideg,i,i,ir) = 0 + T requires that the irvec(:,ir), which locates in the the Wigner Seitz cell centered at the origin, to be 0.

Thus, if irvec = 0, then all the irdist_ws should also satisfy TB convention. If any irdist_ws = 0, then irvec must be 0.
In conclusion, the current if statement, that only test irvec = 0, is enough.

Please point out my mistakes if I misunderstand anything.

Thanks,
Junfeng

@paulatz
Copy link
Contributor

paulatz commented Jul 19, 2019

Hello Junfeg,
I think you are making it too complicated. Note that if i==j and ir==ir0 (i.e. R=0) than the degeneracy can only be 1, because an atom cannot be on the border of the WS cell centered on itself. In other words, if you set r_m == r_n and R==0 than the minimal distance is for T==0, and it is zero. Also, note that R+T is always different from zero, except when R==0 and T==0, because {R} are always inside the supercell, while {T} are always outside, and if R'+T'==R+T then R'=R and T'=T. I think you can prove this formally using the fact that the matrix of the (super-)cell vectors is invertible

@qiaojunfeng
Copy link
Collaborator Author

Dear Lorenzo,

Thanks for the comment. I forgot the condition i == j so things became complicated. When i == j and R == 0, the degeneracy is 1, so irvec is identical to irdist_ws. In conclusion, it's safe to use either irvec or irdist_ws in the if statements.

In terms of your previous comment on the symmetry of wdist_ndeg, I printed it out and found at some ir the wdist_ndeg(:, :, ir) were not symmetric. I will try to find out why.

@Julen-ia
Copy link
Contributor

Dear Junfeng and Lorenzo

Thank you for your replies. You obviously have more experience than me with this interpolation procedure.

From your discussion, I agree that the right thing to do is leaving the if statement as it is.

Thanks again

@paulatz
Copy link
Contributor

paulatz commented Jul 21, 2019

I've been thinking about wdist_ndeg being symmetric in i,j and I'm no more convinced that it is.

I would say that if i_m+R is on the face (or edge, or vertex) of the super-WS cell of i_n, that i_n must be on the face (or edge, or vertex) of the super-WS of i_m+R.

But we are not doing this, we are exchanging i_n-(i_m+R) with i_m-(i_n+R), which is not the same. There must however exist on R' that it is super-periodically equivalent to -R, and wdist_nedg(i,j,R) = wdist_ndeg(j,i,R'), which is a much weaker condition

@qiaojunfeng
Copy link
Collaborator Author

qiaojunfeng commented Jul 22, 2019

Dear Lorenzo,

Thanks! I agree with your arguments so wdist_ndeg is not necessarily symmetric.
By the way, a little question about the code comment:

! function IW translated in the Wigner-Size around function JW
! and also find its degeneracy, and the integer shifts needed
! to identify it
! Note: the routine outputs R_out, but we don't really need it
! This is kept in case in the future we might want to use it
! R_out contains the actual vector between the two WFs. We
! calculate instead crdist_ws, that is the Bravais lattice vector
! between two supercell lattices, that is the only one we need
! later for interpolation etc.
CALL R_wz_sc(-wannier_centres(:, iw) &
+ (irvec_cart + wannier_centres(:, jw)), (/0._dp, 0._dp, 0._dp/), &
wdist_ndeg(iw, jw, ir), R_out, shifts)

Maybe it is better to write function JW translated in the Wigner-Seitz around function IW?
This is more consistent with the code and other comments:
integer, public, save, allocatable :: irdist_ws(:, :, :, :, :)!(3,ndegenx,num_wann,num_wann,nrpts)
!! The integer number of unit cells to shift Wannier function j to put its centre
!! inside the Wigner-Seitz of wannier function i. If several shifts are

subroutine R_wz_sc(R_in, R0, ndeg, R_out, shifts)
!! Put R_in in the Wigner-Seitz cell centered around R0,
!! and find all equivalent vectors to this (i.e., with same distance).

@paulatz
Copy link
Contributor

paulatz commented Jul 22, 2019

Yes, you are right, also the general grammar and spelling of the comment is worse than my usual, I'll let you fix it if you want

@codecov
Copy link

codecov bot commented Jul 23, 2019

Codecov Report

Merging #273 into develop will decrease coverage by 0.02%.
The diff coverage is 15.38%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #273      +/-   ##
===========================================
- Coverage    63.33%   63.31%   -0.03%     
===========================================
  Files           29       29              
  Lines        17975    17979       +4     
===========================================
- Hits         11385    11383       -2     
- Misses        6590     6596       +6
Impacted Files Coverage Δ
src/ws_distance.F90 62.85% <ø> (ø) ⬆️
src/plot.F90 49.51% <0%> (ø) ⬆️
src/postw90/postw90_common.F90 68.63% <15.78%> (-0.73%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 039d7c5...23fe210. Read the comment docs.

@qiaojunfeng
Copy link
Collaborator Author

Great, I think all the problems have been solved.

I have regenerated the benchmark in test-suite/tests/testpostw90_si_geninterp_wsdistance and benchmarks in the SHC folders (using gcc 9.1.0, serial run), so the Travis-ci can pass now.
By the way, I increased the tolerance a bit for the SHC tests, so issue #269 may pass as well.

@giovannipizzi
Copy link
Member

Thanks a lot @qiaojunfeng for spotting this bug and providing a fix, and all for the very useful discussions and bug scavenging :-)

We were discussing earlier today with @mostofi, @jryates and @ivosouza0 and we just had a doubt because of the relatively large change of the eigenvalues for the silicon interpolation with geninterp, so we decided to do additional tests before merging.

I just did them for Si (example 11). Indeed, the code before and after this PR returns the same band structure when interpolated with Wannier90, but the previous code (current develop, before these fixes) returned a different band structure between wannier90 and postw90 (with geninterp).
With the changes discussed here, instead, also the geninterp interpolation goes back to being the same as the one of wannier90.

So, I think that the fix is correct and I'm going to merge it.

@giovannipizzi giovannipizzi merged commit 5dda132 into wannier-developers:develop Jul 24, 2019
@qiaojunfeng qiaojunfeng deleted the use_ws_distance branch January 23, 2020 10:44
manxkim pushed a commit to manxkim/wannier90 that referenced this pull request Jan 10, 2021
…tance

Two bugs in pw90common_fourier_R_to_k*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use_ws_distance in pw90common_fourier_R_to_k_vec
7 participants