Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual behavior of "OneShiftOnly" and comparison of other optimizers #3968

Closed
aannabe opened this issue Apr 22, 2022 · 13 comments · Fixed by #3969
Closed

Unusual behavior of "OneShiftOnly" and comparison of other optimizers #3968

aannabe opened this issue Apr 22, 2022 · 13 comments · Fixed by #3969

Comments

@aannabe
Copy link
Contributor

aannabe commented Apr 22, 2022

Describe the bug
An unusual behavior was observed for the Tb atom using OneShiftOnly optimizer. See the comparison of optimizers for energy and variance.

energy
variance
ratio

To Reproduce

  1. Built from a recent commit 0ac62ea
  2. Compile script is attached.

System:

  • system name: NERSC/Cori-KNL
  • modules loaded: See attached file

Additional context
All optimizers use the same parameters:

                linopt1 = linear(
                    energy               = 0.9,
                    unreweightedvariance = 0.0,
                    reweightedvariance   = 0.1,
                    samples              = int(1e6),
                    substeps             = 3,
                    warmupSteps          = 20,
                    blocks               = 50,
                    nonlocalpp           = True,
                    usedrift             = True,
                    minmethod            = optimizer,
                    minwalkers           = 0.1,
                    timestep             = 0.5,
                    )

Other parameters are default ones.

Using the same resources, the execution times are given below:

Optimizer	Execution Time (secs)
quartic		1641
apative		895.4
OneShiftOnly	359
descent		269.5

optimizers_compare.zip

@ye-luo
Copy link
Contributor

ye-luo commented Apr 22, 2022

Could you enable use_nonlocalpp_deriv?

@prckent
Copy link
Contributor

prckent commented Apr 22, 2022

Isn't this on by default now? ( I am amazed that the optimizer was as successful as it was without them )

@ye-luo
Copy link
Contributor

ye-luo commented Apr 22, 2022

This system has NonLocalECP/LocalEnergy ~ 27% in VMC. that is why I feel use_nonlocalpp_deriv can be critical.
It is still not default. The WF derivative support is still sparse see 2b) #3789

@ye-luo
Copy link
Contributor

ye-luo commented Apr 22, 2022

Nlpp derivatives seem to be the key, on my workstation

optJ12_OneShiftOnly_nlpp$ qmca -q ev *.scalar.dat
                            LocalEnergy               Variance           ratio 
optJ12  series 0  -126.302186 +/- 0.005828   13.094399 +/- 0.021925   0.1037 
optJ12  series 1  -126.847775 +/- 0.002220   3.525128 +/- 0.012072   0.0278 
optJ12  series 2  -126.873861 +/- 0.002854   4.278754 +/- 0.014905   0.0337 
optJ12  series 3  -126.885645 +/- 0.003060   4.437665 +/- 0.024933   0.0350 
optJ12  series 4  -126.886982 +/- 0.002496   4.461602 +/- 0.016027   0.0352 
optJ12  series 5  -126.884445 +/- 0.003074   4.459609 +/- 0.016952   0.0351 
optJ12  series 6  -126.884534 +/- 0.003664   4.484875 +/- 0.019814   0.0353 
optJ12  series 7  -126.886260 +/- 0.003808   4.563254 +/- 0.035779   0.0360 
optJ12  series 8  -126.887999 +/- 0.003236   4.482411 +/- 0.015329   0.0353 
optJ12  series 9  -126.886232 +/- 0.002063   4.547143 +/- 0.023041   0.0358 

@aannabe
Copy link
Contributor Author

aannabe commented Apr 22, 2022

Below are with derivatives. Indeed it fixes the issue.

energy
variance
ratio

The old/new execution times:

Optimizer	Time_No_Deriv (secs)	Time_with_Deriv (secs)	Increase(%)
quartic		1641			1653			0.7
apative		895.4			991.9			10.7
OneShiftOnly	359			510.7			42.3
descent		269.5			359.1			33.2

I agree with having use_nonlocalpp_deriv on by default.

@ye-luo
Copy link
Contributor

ye-luo commented Apr 22, 2022

Your curves confirm my expectation.

  1. quartic has cost function mixing energy and variance. That is why it has the lowest variance but not energy.
  2. oneshift/adaptive/descent do energy minimization only. Thus they converge to the same energy and variance. Energy is lower than quartic but the variance is higher than quartic.

@aannabe
Copy link
Contributor Author

aannabe commented Apr 22, 2022

I wasn't aware that only quartic supports the mixing of energy and variance. I think this is not mentioned in the documentation.
For instance, this block in the documentation gives the wrong impression that OneShiftOnly will do mixed-cost optimization:

https://github.com/QMCPACK/qmcpack/blob/develop/docs/methods.rst#:~:text=%3Cloop%20max%3D%2210,parameter%3E%0A%20%20%20...%0A%20%3C/qmc%3E%0A%3C/loop%3E

@prckent
Copy link
Contributor

prckent commented Apr 22, 2022

Hmm. I know there are some gaps, but even going back to the 2014 and 2016 workshop it was assumed that arbitrary mixes of energy and variance were supported by all, or nearly all, of the optimizers.

I think we need to get more serious on this topic and should at least issue a warning if the cost function is not pure energy for optimizers that don't support this. ( Similarly #3969 should have warnings where there are gaps in implementation )

@ye-luo
Copy link
Contributor

ye-luo commented Apr 22, 2022

Users rarely read QMCPACK warning unless a run breaks down. Documentation likely more.
I kind of doubt issuing a warning really helps anything but implementing a warning cost developer cycles.

@jtkrogel
Copy link
Contributor

IMO optimizers that do not use <cost/> should abort if it is provided. This will prevent false conclusions from being made about the relationship between the inputs and outputs.

@jtkrogel
Copy link
Contributor

@aannabe thanks for reporting the unusual behavior (both w.r.t. optimization performance and input inconsistencies). This kind of information is quite valuable and not enough people take the time to report.

@jtkrogel
Copy link
Contributor

A follow-on question: why is OneShiftOnly so much more sensitive to the exclusion of the nlpp derivative data than the other optimizers? Reduced robustness in one context raises questions about others.

@ye-luo
Copy link
Contributor

ye-luo commented Apr 25, 2022

@jtkrogel

  1. If cost doesn't apply to a optimization method, the code should stop it. OneShiftOnly should abort when cost function is provided #1494
  2. timing. Need to see full details. The full mpirun line. Input, output file. Need to scrutinize the timing data before discussing why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants