Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variance minimization failure between v3.6.0 and v3.7.0 #2496

Closed
jtkrogel opened this issue May 28, 2020 · 16 comments · Fixed by #2683
Closed

Variance minimization failure between v3.6.0 and v3.7.0 #2496

jtkrogel opened this issue May 28, 2020 · 16 comments · Fixed by #2683
Labels

Comments

@jtkrogel
Copy link
Contributor

jtkrogel commented May 28, 2020

I noticed that QMCPACK currently struggles badly to optimize a Jastrow by variance minimization for a simple oxygen atom with a fairly short ranged two-body Jastrow (J1 rcut=3.5, J2 rcut=6.0). Even when handed a good Jastrow optimized via an older build, the optimizer still drifts away to extremely poor wavefunctions.

Below is a sequence of optimizations performed for QMCPACK versions spanning 3.5.0 to 3.9.2. The behavior of the code is reproducible before and after the breakage, which was introduced somewhere between v3.6.0 and v3.7.0:

                          LocalEnergy              Variance                ratio 
v350_soa_real  series 0  -15.728765 +/- 0.009463   1.492333 +/- 0.028086   0.0949 
v350_soa_real  series 1  -15.867034 +/- 0.004702   0.401681 +/- 0.021359   0.0253 
v350_soa_real  series 2  -15.867630 +/- 0.005569   0.355559 +/- 0.007041   0.0224 
v350_soa_real  series 3  -15.870056 +/- 0.004769   0.403680 +/- 0.041663   0.0254 
v350_soa_real  series 4  -15.855346 +/- 0.004068   0.376875 +/- 0.015588   0.0238 
v350_soa_real  series 5  -15.872111 +/- 0.004198   0.383714 +/- 0.013272   0.0242 
v350_soa_real  series 6  -15.864396 +/- 0.003779   0.381579 +/- 0.013733   0.0241 
v350_soa_real  series 7  -15.862345 +/- 0.004467   0.365200 +/- 0.012512   0.0230 
 
v360_soa_real  series 0  -15.729211 +/- 0.008024   1.530882 +/- 0.023830   0.0973 
v360_soa_real  series 1  -15.868733 +/- 0.004633   0.392810 +/- 0.029925   0.0248 
v360_soa_real  series 2  -15.870868 +/- 0.004801   0.358298 +/- 0.009480   0.0226 
v360_soa_real  series 3  -15.867750 +/- 0.003984   0.343698 +/- 0.009605   0.0217 
v360_soa_real  series 4  -15.868260 +/- 0.004139   0.364282 +/- 0.012437   0.0230 
v360_soa_real  series 5  -15.871399 +/- 0.004174   0.385131 +/- 0.019431   0.0243 
v360_soa_real  series 6  -15.860257 +/- 0.004646   0.351132 +/- 0.011940   0.0221 
v360_soa_real  series 7  -15.847381 +/- 0.004128   0.401248 +/- 0.008341   0.0253 
 
v370_soa_real  series 0  -15.724448 +/- 0.007897   1.544877 +/- 0.023274   0.0982 
v370_soa_real  series 1  -15.749744 +/- 0.010039   1.989610 +/- 0.067119   0.1263 
v370_soa_real  series 2  -15.774872 +/- 0.015697   2.582455 +/- 0.177060   0.1637 
v370_soa_real  series 3  -15.737155 +/- 0.010289   2.424819 +/- 0.080883   0.1541 
v370_soa_real  series 4  -15.728370 +/- 0.011444   2.359724 +/- 0.049627   0.1500 
v370_soa_real  series 5  -15.705689 +/- 0.010365   2.383542 +/- 0.058148   0.1518 
v370_soa_real  series 6  -15.728391 +/- 0.013717   2.453381 +/- 0.070631   0.1560 
v370_soa_real  series 7  -15.713733 +/- 0.009296   2.441217 +/- 0.052501   0.1554 
 
v380_soa_real  series 0  -15.728750 +/- 0.007862   1.502734 +/- 0.021760   0.0955 
v380_soa_real  series 1  -15.777613 +/- 0.009606   1.884318 +/- 0.059431   0.1194 
v380_soa_real  series 2  -15.763667 +/- 0.009097   2.276413 +/- 0.070049   0.1444 
v380_soa_real  series 3  -15.746334 +/- 0.009049   2.081680 +/- 0.047240   0.1322 
v380_soa_real  series 4  -15.735534 +/- 0.011223   2.158009 +/- 0.063911   0.1371 
v380_soa_real  series 5  -15.732327 +/- 0.009506   2.243873 +/- 0.046932   0.1426 
v380_soa_real  series 6  -15.720645 +/- 0.012374   2.206366 +/- 0.070763   0.1403 
v380_soa_real  series 7  -15.728993 +/- 0.008947   2.112217 +/- 0.059391   0.1343 
 
v392_soa_real  series 0  -15.721737 +/- 0.011744   1.565547 +/- 0.033137   0.0996 
v392_soa_real  series 1  -15.725116 +/- 0.010556   2.609049 +/- 0.061071   0.1659 
v392_soa_real  series 2  -15.728824 +/- 0.011605   2.423152 +/- 0.049712   0.1541 
v392_soa_real  series 3  -15.738421 +/- 0.009981   2.125916 +/- 0.048893   0.1351 
v392_soa_real  series 4  -15.718103 +/- 0.011568   2.528542 +/- 0.081094   0.1609 
v392_soa_real  series 5  -15.724731 +/- 0.010427   2.767007 +/- 0.086160   0.1760 
v392_soa_real  series 6  -15.717093 +/- 0.010098   2.275466 +/- 0.060844   0.1448 
v392_soa_real  series 7  -15.713159 +/- 0.011786   2.196424 +/- 0.063463   0.1398 
@jtkrogel jtkrogel added the bug label May 28, 2020
@jtkrogel
Copy link
Contributor Author

Files needed to reproduce:

O_atom_files.zip

@prckent
Copy link
Contributor

prckent commented May 28, 2020

The reported variance is increased! :-(

Can you check the output of the first cycle of the optimizer? Does v3.7.0 believe to have reached a similar minimum to <=v3.6.0, or is a failure of plumbing already visible?

@prckent
Copy link
Contributor

prckent commented May 28, 2020

It is worth noting that this uses a spline wavefunction and the quartic optimizer. There should be no excitement or unexpected results with this combination.

@jtkrogel
Copy link
Contributor Author

jtkrogel commented May 28, 2020

The first series (series 0) is just measuring the inputted Jastrow. Here is what I get when I run VMC instead of OPT with the same input:

                LocalEnergy              Variance                ratio 
vmc  series 0  -15.732169 +/- 0.002387   1.514125 +/- 0.007046   0.0962 

The energy variance reported to the log file appears to be wrong for 3.7.0 and later. The variance reported in the log is much larger than the actual variance. It is reported as decreasing, but obviously the internally computed variance is faulty:


reported in log                     measured from scalar.dat
---------------                     ------------------------
v3.6.0
  VMC Evar = 1.5374e+00             1.530882 +/- 0.023830
  ...                               ...
  VMC Evar = 4.0338e-01             0.401248 +/- 0.008341

v3.7.0
  VMC Evar = 5.1876e+01             1.544877 +/- 0.023274
  ...
  VMC Evar = 3.4870e+01             2.441217 +/- 0.052501

@jtkrogel
Copy link
Contributor Author

Also confirming this is not an AoS/SoA issue as both have the same behavior.

@camelto2
Copy link
Contributor

camelto2 commented May 28, 2020

I don't know if anyone else was checking this, but I did a quick git bisect on this to track down the issue. It looks like the first bad commit was 16845f9, which was part of #1404.

Attached are some files showing the data from the bisect
bisect.zip

@ye-luo
Copy link
Contributor

ye-luo commented May 28, 2020

@camelto2 thanks for corner the commit. I will have a look.

@prckent
Copy link
Contributor

prckent commented May 29, 2020

Thanks Cody.

@dubeckym
Copy link

dubeckym commented Sep 4, 2020

hi all, any progress here?
we're having similar problem between versions 3.6.0 and 3.9.2 with C atom but also other systems, we use linear combination of energy and variance (5%) and oneshiftonly method; the VMC total energy does not go properly down vs the old version
see the following data, thanks!
matus

ver 392
C-HF-M1 series 0 -5.388646 ± 0.000115 0.129059 ± 0.000418 0.0240
C-HF-M1 series 1 -5.366564 ± 0.000246 0.494325 ± 0.011003 0.0921
C-HF-M1 series 2 -5.345155 ± 0.000288 0.792722 ± 0.001940 0.1483
C-HF-M1 series 3 -5.336992 ± 0.000331 0.918207 ± 0.001977 0.1720
C-HF-M1 series 4 -5.342867 ± 0.000282 0.868246 ± 0.001905 0.1625
C-HF-M1 series 5 -5.340162 ± 0.000290 0.875595 ± 0.001737 0.1640
C-HF-M1 series 6 -5.339831 ± 0.000301 0.900563 ± 0.004500 0.1687
C-HF-M1 series 7 -5.344731 ± 0.000274 0.843170 ± 0.001549 0.1578
C-HF-M1 series 8 -5.335147 ± 0.000294 0.935005 ± 0.002144 0.1753
C-HF-M1 series 9 -5.338250 ± 0.000264 0.929523 ± 0.003506 0.1741
C-HF-M1 series 10 -5.346972 ± 0.000267 0.818708 ± 0.001812 0.1531

C-HF-M1 series 0 -5.341298 ± 0.000776 0.876985 ± 0.003972 0.1642 FINAL VMC

ver360
C-HF-M1 series 1 -5.398901 ± 0.000075 0.071496 ± 0.000460 0.0132
C-HF-M1 series 2 -5.399298 ± 0.000073 0.080216 ± 0.003216 0.0149
C-HF-M1 series 3 -5.399447 ± 0.000080 0.080519 ± 0.001267 0.0149
C-HF-M1 series 4 -5.399592 ± 0.000084 0.077717 ± 0.000412 0.0144
C-HF-M1 series 5 -5.399378 ± 0.000090 0.077722 ± 0.000521 0.0144
C-HF-M1 series 6 -5.399768 ± 0.000089 0.078710 ± 0.000487 0.0146
C-HF-M1 series 7 -5.399638 ± 0.000095 0.079317 ± 0.000731 0.0147
C-HF-M1 series 8 -5.399620 ± 0.000092 0.077605 ± 0.000532 0.0144
C-HF-M1 series 9 -5.399560 ± 0.000085 0.077972 ± 0.000447 0.0144
C-HF-M1 series 10 -5.399574 ± 0.000080 0.078284 ± 0.000666 0.0145

C-HF-M1 series 0 -5.399805 ± 0.000171 0.078521 ± 0.001085 0.0145 FINAL VMC

@prckent
Copy link
Contributor

prckent commented Sep 4, 2020

We'll get on it. This has to be a simple plumbing failure.

@ye-luo
Copy link
Contributor

ye-luo commented Sep 4, 2020

@dubeckym oneshiftonly is doing only energy minimization. So I need to take a close look. Could you attach a reproducer?

@jtkrogel
Copy link
Contributor Author

jtkrogel commented Sep 4, 2020

Hopefully the same change accounts for the problems both with oneshift and variance minimization. Agreed that examples are needed for both.

@dubeckym
Copy link

dubeckym commented Sep 4, 2020

hi, thanks! this example shows the problem:
rep.zip

@ye-luo
Copy link
Contributor

ye-luo commented Sep 4, 2020

@dubeckym Thank you. I will investigate both problems together.

@dubeckym
Copy link

dubeckym commented Sep 7, 2020

dear Ye Luo,
it works now! thank you very much for a prompt and excellent work!
matus

@prckent
Copy link
Contributor

prckent commented Sep 8, 2020

Thanks for the quick fix Ye and thanks for letting us know it works @dubeckym . As of a few moments ago, the fix has been merged to develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants