Two bugs in pw90common_fourier_R_to_k* #273

qiaojunfeng · 2019-05-28T03:10:48Z

There are 2 major bugs in routines pw90common_fourier_R_to_k*

The loop index i, j on wdist_ndeg is wrong, and this will cause wrong interpolation results of all the calculations. In some bandstructure calculations, the Wannier interpolated bands significantly deviate from ab initio bands. Applying fixes of this commit can correctly interpolate bandstructure.
The repeated calculations of rdotk and phase_fac can lead to wrong results, as pointed out in issue 271.

Fixes #271.

giovannipizzi · 2019-07-03T06:11:26Z

Thanks! Indeed the loops seem wrong. Do you have an example where the interpolation is wrong because of this bug and gets much better after the fix?

qiaojunfeng · 2019-07-04T07:57:51Z

The attachment is a zip file including the input and output files of QE & wannier90.
input_output.zip

The structure of the zip is:

./input_output
├── bands.in
├── bands.x.in
├── Bi.rel-pbe-dn-kjpaw_psl.0.2.2.UPF
├── figure
│   ├── new
│   │   ├── mbt-bands.dat
│   │   ├── mbt-bands.gnu
│   │   ├── mbt-bands.pdf
│   │   ├── mbt-bands.py
│   │   └── mbt-path.kpt
│   ├── old
│   │   ├── mbt-bands.dat
│   │   ├── mbt-bands.gnu
│   │   ├── mbt-bands.pdf
│   │   └── mbt-bands.py
│   └── qeband.pdf
├── mbt.pw2wan
├── mbt.win
├── mbt.wout
├── Mn.rel-pbe-spn-kjpaw_psl.0.3.1.UPF
├── nscf.in
├── scf.in
└── Te.rel-pbe-dn-kjpaw_psl.0.3.1.UPF

To reduce computational time, I have decreased energy cutoff, kmesh in QE run, and the number of iterations in wannierisation step, so the spreads of MLWF are not very good, around 3.8, as shown in ./input_output/mbt.wout.

The file ./input_output/figure/qeband.pdf is the ab initio band produced by plotband.x of QE.
The file ./input_output/figure/old/mbt-bands.pdf is the Wannier interpolation band produced by the develop branch of postw90.x.
The file ./input_output/figure/new/mbt-bands.pdf is the Wannier interpolation band produced by my modified postw90.x.

Since the chk, amn, mmn, eig files are too large, I haven't included them in the zip.

For your convenience, the figures are pasted here:

Note added: Actually, these weird interpolations were identified by my colleague, Jiaqi Zhou. After carefully digging into the source code I found it was these tiny mistakes that resulted in the wrong interpolation. Moreover, if the spreads of MLWF are larger, the interpolation is much worse, totally different from the ab initio results.

stepan-tsirkin · 2019-07-10T13:48:19Z

Dear Junfeng, thank you for pointing to this bug.

I also suspect there is another bug with ws_distance in postw90_common.F90

line 732 ( subroutine pw90common_fourier_R_to_k )
reads:
OO(i, j) = OO(i, j) + cmplx_i*crdist_ws(alpha, ideg, i, j, ir)phase_facOO_R(i, j, ir)

which I believe is correct. However in the other versions of this subroutine the array crvec is used instead of crdist_ws, also when use_ws_distance=True

see lines 732, 820-825, 881, 888, 995, 1093-1101, 1182, 1318

e.g line 820:
if (present(OO_dx)) OO_dx(:, :) = OO_dx(:, :) + &
cmplx_i*crvec(1, ir)phase_facOO_R(:, :, ir)

Am I right, that those lines also need to be corrected (crvec-> crdist_ws) ?

This does not change the interpolated energies, but plays a role when interpolating velocities, Berry curvatures, etc

gibertini · 2019-07-12T10:25:33Z

Dear Junfeng,

thanks for spotting this. I also totally agree that both 1. and 2. are bugs and I believe that your commits are the right solution.

For 1. indeed the indices i and j of wdist_ndeg are not used consistently within the same loops. So for sure one needs to make a consistent choice. Among the two possibilities (i,j or j,i), I agree that the one you chose is the correct one.

For 2. I think the re-definition of rdotk and phase_fac is the result of an unfortunate cut-and-paste from the old version that was not commented out. Indeed, whenever rdotk and phase_fac are defined in the new way, the old version is kept right below and commented. I think it would be simply better to remove all such commented lines, together with the ones erroneously uncommented that you spotted.

gibertini · 2019-07-12T10:37:40Z

Dear Stepan,

It think you are right. There was initially an issue with the interpolation of band velocity that was solved in #53 .
Unfortunately the solution was not extended to other interpolation routines.

Thanks you for spotting this!

stepan-tsirkin · 2019-07-12T12:58:24Z

Good, How do we proceed?
Should I open a separate pull request to implement what I noted in the previous comment?
Or just the current PR may be updated.?

qiaojunfeng · 2019-07-13T07:12:19Z

Dear Marco Gibertini,

Thanks! I have removed those comments.

qiaojunfeng · 2019-07-13T07:31:09Z

Dear Stepan S. Tsirkin,

Thanks! I think it should be crdist_ws too. If you like, I can help fix the lines you mentioned, or you fix the bugs by creating a new pull request, which do you prefer?

stepan-tsirkin · 2019-07-13T07:38:39Z

Dear Junfeng,
It would be nice if you fix those lines, and I double-check afterwards.
It is a related bug, so it does not deserve a separate PR, I think.

qiaojunfeng · 2019-07-13T09:27:13Z

Dear Stepan,

I have corrected the crdist_ws bugs, could you please double-check my modifications?
I split some long lines to ensure their length < 132, otherwise, the compiler will complain:

../postw90/postw90_common.F90:1001:132:

 1001 |                                         (crdist_ws(a, ideg, i, j, ir) + local_wannier_centres(a, j) - local_wannier_centres(a, i))* &
      |                                                                                                                                    1
Error: Line truncated at (1) [-Werror=line-truncation]

stepan-tsirkin · 2019-07-13T22:03:09Z

Hi Jungeng,

Thank you, the correction look fine.

Just now after a careful look, in lines 1302
and 1312 it seems to me that irvec also should be replaced to irdist. But to be sure it is better to consult the contributors that implemented that ..._TB_conv subroutine

giovannipizzi · 2019-07-16T06:55:19Z

If I am not mistaken, those lines are introduced here by @Julen-ia but they have comments with [lp] who I guess is @paulatz (maybe copy-paste from elsewhere?).

Could @Julen-ia and @paulatz comment on these points/possible bugs? (We are having a coding day on the 24th and if we figure out everything before then we can review and merge). Thanks!

Julen-ia · 2019-07-18T11:05:45Z

Dear all,

Thank you for spotting the bugs, and also these two lines that I introduced and need to be changed. I think that the safest solution is to specify that both irdist_ws and irvec need to be zero in the if statement Stepan spotted.

In principle, the if statement should only apply to one R vector, namely \vec{R}=0 (the origin). In ws_distance.f90, line 136, I see

irdist_ws(:, ideg, iw, jw, ir) = irvec(:, ir) + shifts(:, ideg)

if we switch irvec --> irdist_ws in the if statement Stepan spotted, irdist_ws = 0 could be triggered even if irvec(:,ir) \neq 0 provided irvec = - shifts , and this is not what we want. I dont have much experience with irdist_ws, so what I just said may never take place. Unfortunately I am on travel at the moment and I have limited availability for doing numerical checks.

Any feedback is appreciated, thank you all

Best regards

Julen

paulatz · 2019-07-18T11:38:03Z

It's been a while, and I'm not sure I remember everything correctly. From what I see, the "old" subroutine pw90common_fourier_R_to_k should be correct, except for the exchange of i with j in wdist_ndeg. I think this is not really a bug since wdist_ndeg(i,j,ir) is symmetric in i,j (I cannot think of any case where it could be non-symmetric, but I may be wrong).

On the other hand, the "new" subroutine pw90common_fourier_R_to_k_new was actually still using the "old" interpolation, without using the minimum distance. All the changes look correct to me and should fix the problem.

In pw90common_fourier_R_to_k_vec (line ~1087) the re-definition of the phase factor also looks quite clearly like a bug to me and I think the new code fixes it.

cheers

qiaojunfeng · 2019-07-19T05:45:59Z

Dear Julen,

Thanks for the response!
I spent some time reading your PRB paper, and I will try to convey my thoughts about this if statement.

From Eq.(21) in your paper, the TB convention is

In the Wigner Seitz interpolation, we replace the ordinary interpolation

by

i.e. the original R is replaced by R+T to ensure minimal distance between Wannier centers r_m and r_n + R + T, where T is the supercell translations. (These equations can be found in the v3 review paper Eq.(47,48) ).

Since R+T is identical to R, the TB convention should also hold for any 0 + T, not only R = 0.
For ir in the range of 1 to nrpts, there exists exactly 1 ir which satisfy irvec(:,ir) = 0, that is this ir corresponds to R = 0. For this ir, the TB convention is satisfied, and its translations R + T should also satisfy TB convention, i.e. all the irdist_ws(:, ideg, i, i, ir) with ideg ranges from 1 to wdist_ndeg(i, i, ir). So, if we specify both irdist_ws and irvec be 0, then we may lose some irdist_ws which are the degenerate of R = 0.
In reverse, is it possible that for some ir, irvec(:,ir) != 0 but irdist_ws(:,ideg,i,i,ir) = 0 + T? The answer is no, because irdis_ws is the supercell translations of irvec. The condition irdist_ws(:,ideg,i,i,ir) = 0 + T requires that the irvec(:,ir), which locates in the the Wigner Seitz cell centered at the origin, to be 0.

Thus, if irvec = 0, then all the irdist_ws should also satisfy TB convention. If any irdist_ws = 0, then irvec must be 0.
In conclusion, the current if statement, that only test irvec = 0, is enough.

Please point out my mistakes if I misunderstand anything.

Thanks,
Junfeng

paulatz · 2019-07-19T09:14:49Z

Hello Junfeg,
I think you are making it too complicated. Note that if i==j and ir==ir0 (i.e. R=0) than the degeneracy can only be 1, because an atom cannot be on the border of the WS cell centered on itself. In other words, if you set r_m == r_n and R==0 than the minimal distance is for T==0, and it is zero. Also, note that R+T is always different from zero, except when R==0 and T==0, because {R} are always inside the supercell, while {T} are always outside, and if R'+T'==R+T then R'=R and T'=T. I think you can prove this formally using the fact that the matrix of the (super-)cell vectors is invertible

qiaojunfeng · 2019-07-19T14:01:11Z

Dear Lorenzo,

Thanks for the comment. I forgot the condition i == j so things became complicated. When i == j and R == 0, the degeneracy is 1, so irvec is identical to irdist_ws. In conclusion, it's safe to use either irvec or irdist_ws in the if statements.

In terms of your previous comment on the symmetry of wdist_ndeg, I printed it out and found at some ir the wdist_ndeg(:, :, ir) were not symmetric. I will try to find out why.

Julen-ia · 2019-07-20T06:54:07Z

Dear Junfeng and Lorenzo

Thank you for your replies. You obviously have more experience than me with this interpolation procedure.

From your discussion, I agree that the right thing to do is leaving the if statement as it is.

Thanks again

paulatz · 2019-07-21T18:26:12Z

I've been thinking about wdist_ndeg being symmetric in i,j and I'm no more convinced that it is.

I would say that if i_m+R is on the face (or edge, or vertex) of the super-WS cell of i_n, that i_n must be on the face (or edge, or vertex) of the super-WS of i_m+R.

But we are not doing this, we are exchanging i_n-(i_m+R) with i_m-(i_n+R), which is not the same. There must however exist on R' that it is super-periodically equivalent to -R, and wdist_nedg(i,j,R) = wdist_ndeg(j,i,R'), which is a much weaker condition

qiaojunfeng · 2019-07-22T02:19:24Z

Dear Lorenzo,

Thanks! I agree with your arguments so wdist_ndeg is not necessarily symmetric.
By the way, a little question about the code comment:

wannier90/src/ws_distance.F90

Lines 123 to 134 in 6f61f72

    
           ! function IW translated in the Wigner-Size around function JW 
        
           ! and also find its degeneracy, and the integer shifts needed 
        
           ! to identify it 
        
           ! Note: the routine outputs R_out, but we don't really need it 
        
           ! This is kept in case in the future we might want to use it 
        
           ! R_out contains the actual vector between the two WFs. We 
        
           ! calculate instead crdist_ws, that is the Bravais lattice vector 
        
           ! between two supercell lattices, that is the only one we need 
        
           ! later for interpolation etc. 
        
           CALL R_wz_sc(-wannier_centres(:, iw) & 
        
                        + (irvec_cart + wannier_centres(:, jw)), (/0._dp, 0._dp, 0._dp/), & 
        
                        wdist_ndeg(iw, jw, ir), R_out, shifts)

Maybe it is better to write function JW translated in the Wigner-Seitz around function IW?
This is more consistent with the code and other comments:

wannier90/src/ws_distance.F90

Lines 33 to 35 in 6f61f72

    
           integer, public, save, allocatable :: irdist_ws(:, :, :, :, :)!(3,ndegenx,num_wann,num_wann,nrpts) 
        
           !! The integer number of unit cells to shift Wannier function j to put its centre 
        
           !! inside the Wigner-Seitz of wannier function i. If several shifts are

wannier90/src/ws_distance.F90

Lines 146 to 148 in 6f61f72

    
           subroutine R_wz_sc(R_in, R0, ndeg, R_out, shifts) 
        
             !! Put R_in in the Wigner-Seitz cell centered around R0, 
        
             !! and find all equivalent vectors to this (i.e., with same distance).

paulatz · 2019-07-22T07:44:18Z

Yes, you are right, also the general grammar and spelling of the comment is worse than my usual, I'll let you fix it if you want

codecov · 2019-07-23T08:22:39Z

Codecov Report

Merging #273 into develop will decrease coverage by 0.02%.
The diff coverage is 15.38%.

@@             Coverage Diff             @@
##           develop     #273      +/-   ##
===========================================
- Coverage    63.33%   63.31%   -0.03%     
===========================================
  Files           29       29              
  Lines        17975    17979       +4     
===========================================
- Hits         11385    11383       -2     
- Misses        6590     6596       +6

Impacted Files	Coverage Δ
src/ws_distance.F90	`62.85% <ø> (ø)`	⬆️
src/plot.F90	`49.51% <0%> (ø)`	⬆️
src/postw90/postw90_common.F90	`68.63% <15.78%> (-0.73%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 039d7c5...23fe210. Read the comment docs.

qiaojunfeng · 2019-07-23T08:23:40Z

Great, I think all the problems have been solved.

I have regenerated the benchmark in test-suite/tests/testpostw90_si_geninterp_wsdistance and benchmarks in the SHC folders (using gcc 9.1.0, serial run), so the Travis-ci can pass now.
By the way, I increased the tolerance a bit for the SHC tests, so issue #269 may pass as well.

giovannipizzi · 2019-07-24T17:30:55Z

Thanks a lot @qiaojunfeng for spotting this bug and providing a fix, and all for the very useful discussions and bug scavenging :-)

We were discussing earlier today with @mostofi, @jryates and @ivosouza0 and we just had a doubt because of the relatively large change of the eigenvalues for the silicon interpolation with geninterp, so we decided to do additional tests before merging.

I just did them for Si (example 11). Indeed, the code before and after this PR returns the same band structure when interpolated with Wannier90, but the previous code (current develop, before these fixes) returned a different band structure between wannier90 and postw90 (with geninterp).
With the changes discussed here, instead, also the geninterp interpolation goes back to being the same as the one of wannier90.

So, I think that the fix is correct and I'm going to merge it.

…tance Two bugs in pw90common_fourier_R_to_k*

qiaojunfeng added 4 commits May 27, 2019 15:06

correct bug in pw90common_fourier_R_to_k_vec when use_ws_distance

82bbfa1

fix wrong loop index of wdist_ndeg

a715684

revert to old version, will update it later

2665d55

changelog for pw90common_fourier_R_to_k

9bb8f6f

qiaojunfeng mentioned this pull request May 31, 2019

Implementation of OpenMP parallelization of berry module #275

Open

giovannipizzi requested review from giovannipizzi, jryates and mostofi June 26, 2019 14:28

remove unnecessary lines

15ea623

Merge branch 'develop' into use_ws_distance

1ddf81e

use crdist_ws instead of crvec

3a84ba3

qiaojunfeng added 3 commits July 23, 2019 13:24

update code comment

b238b2a

update benchmarks for SHC

3ef420f

update benchmark for wsdistance

e08aca1

Merge branch 'develop' into use_ws_distance

60b1465

jryates approved these changes Jul 24, 2019

View reviewed changes

giovannipizzi approved these changes Jul 24, 2019

View reviewed changes

Merge branch 'develop' into use_ws_distance

23fe210

giovannipizzi merged commit 5dda132 into wannier-developers:develop Jul 24, 2019

giovannipizzi mentioned this pull request Jul 30, 2019

Problem (precision) with the SHC tests #269

Closed

qiaojunfeng deleted the use_ws_distance branch January 23, 2020 10:44

manxkim pushed a commit to manxkim/wannier90 that referenced this pull request Jan 10, 2021

Merge pull request wannier-developers#273 from qiaojunfeng/use_ws_dis…

8e1adc0

…tance Two bugs in pw90common_fourier_R_to_k*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two bugs in pw90common_fourier_R_to_k* #273

Two bugs in pw90common_fourier_R_to_k* #273

qiaojunfeng commented May 28, 2019 •

edited by giovannipizzi

Loading

giovannipizzi commented Jul 3, 2019

qiaojunfeng commented Jul 4, 2019 •

edited

Loading

stepan-tsirkin commented Jul 10, 2019

gibertini commented Jul 12, 2019

gibertini commented Jul 12, 2019

stepan-tsirkin commented Jul 12, 2019

qiaojunfeng commented Jul 13, 2019

qiaojunfeng commented Jul 13, 2019

stepan-tsirkin commented Jul 13, 2019

qiaojunfeng commented Jul 13, 2019

stepan-tsirkin commented Jul 13, 2019

giovannipizzi commented Jul 16, 2019

Julen-ia commented Jul 18, 2019

paulatz commented Jul 18, 2019

qiaojunfeng commented Jul 19, 2019

paulatz commented Jul 19, 2019

qiaojunfeng commented Jul 19, 2019

Julen-ia commented Jul 20, 2019

paulatz commented Jul 21, 2019

qiaojunfeng commented Jul 22, 2019 •

edited

Loading

paulatz commented Jul 22, 2019

codecov bot commented Jul 23, 2019 •

edited

Loading

qiaojunfeng commented Jul 23, 2019

giovannipizzi commented Jul 24, 2019

Two bugs in pw90common_fourier_R_to_k* #273

Two bugs in pw90common_fourier_R_to_k* #273

Conversation

qiaojunfeng commented May 28, 2019 • edited by giovannipizzi Loading

giovannipizzi commented Jul 3, 2019

qiaojunfeng commented Jul 4, 2019 • edited Loading

stepan-tsirkin commented Jul 10, 2019

gibertini commented Jul 12, 2019

gibertini commented Jul 12, 2019

stepan-tsirkin commented Jul 12, 2019

qiaojunfeng commented Jul 13, 2019

qiaojunfeng commented Jul 13, 2019

stepan-tsirkin commented Jul 13, 2019

qiaojunfeng commented Jul 13, 2019

stepan-tsirkin commented Jul 13, 2019

giovannipizzi commented Jul 16, 2019

Julen-ia commented Jul 18, 2019

paulatz commented Jul 18, 2019

qiaojunfeng commented Jul 19, 2019

paulatz commented Jul 19, 2019

qiaojunfeng commented Jul 19, 2019

Julen-ia commented Jul 20, 2019

paulatz commented Jul 21, 2019

qiaojunfeng commented Jul 22, 2019 • edited Loading

paulatz commented Jul 22, 2019

codecov bot commented Jul 23, 2019 • edited Loading

Codecov Report

qiaojunfeng commented Jul 23, 2019

giovannipizzi commented Jul 24, 2019

qiaojunfeng commented May 28, 2019 •

edited by giovannipizzi

Loading

qiaojunfeng commented Jul 4, 2019 •

edited

Loading

qiaojunfeng commented Jul 22, 2019 •

edited

Loading

codecov bot commented Jul 23, 2019 •

edited

Loading