-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RooFit::MultiProcess & TestStatistics part 1: mathcore & Minuit2 changes #8369
RooFit::MultiProcess & TestStatistics part 1: mathcore & Minuit2 changes #8369
Conversation
Used in Minuit2 ExternalInternalGradientCalculator (introduced in a later commit) and in RooFit (also later commits) to enable calculation of all derivative and related numbers outside of Minuit2. Specifically, having second derivative and step size calculated outside of Minuit2 allows exact replication of the gradients as calculated in Minuit2 itself, which in turn enables external optimization and parallelization of this process without having to modify Minuit2.
Starting build on |
…daptors Like the changes to IFunction in the previous commit, this is also necessary for Minuit2-external gradient calculation.
Starting build on |
… and increase their precision This commit changes two (functionally speaking strictly separate, but unified in their ultimate purpose) things. First, to further accomodate the added ability of external gradient calculators to pass in second derivative and step sizes, the internal to external parameter transformers in Minuit2 need to have added D2Int2Ext and GStepInt2Ext functions that perform these transformations for those added parts. Second, we increase the precision of the transformers to long double. This is necessary to make sure that trigonometric transformations don't lose a few bits of precision on back-and-forth parameter transformations, which they do when using the double versions. Both changes are necessary for the external gradient calculation in Minuit2 which will be introduced in a next commit in ExternalInternalGradientCalculator.
Starting build on |
…lyticalGradientCalculator
Starting build on |
Starting build on |
Derived class from AnalyticalGradientCalculator. Allows Minuit2 using codes to calculate gradients themselves, but in Minuit2 "internal" parameter space, in order to preserve exact value correspondence with the Minuit2 implementation.
Starting build on |
Thank you so much for investing the time to make new commits! |
Build failed on windows10/cxx14. Errors:
|
@bellenot thanks for the tip. Should I fork to the same org where this ROOT dev branch is at (github.com/roofit-dev) or to my personal account? |
Just a fork to your personal account is sufficient. The scrips are looking for |
@phsft-bot build just on windows10/cxx14 |
Starting build on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @egpbos for this PR.
I have some doubts about adding in the Math/IFunction
interfaces the second derivatives and step sizes. I think, when supplying an external gradient to Minuit2, the G2 is used only when computing the Seed state and eventually used in NegativeG2LineSearch if G2<0.
Recently I have changed the entry point of the minimisation in ModularFunctionMinimizer, see
https://github.com/root-project/root/blob/master/math/minuit2/src/ModularFunctionMinimizer.cxx#L142
when using a FCN computing the gradient.
There the Numerical MInuit2 gradient is used for the seed, so there is really no needs to provide to Minuit G2 and step size's at the cost of using for the first iteration the Minuit2 gradient calculator. The cost of doing this I think is minimal given the fact that a large number of iteration at the end will be done ( at least 10 but often something like 100).
Furthermore the MnHesse is using still the Numerical gradient, so I would prefer if for the time being we make the smallest changes and add in a later time full support for providing second derivatives, that can be used also in MnHesse.
Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
Starting build on |
As highlighted by @lmoneta in PR review comments (root-project#8369 (review)), the second derivative and step size information is not necessary at this point. The floating point exact minuit-external (roofit-internal) derivative duplication is currently done using ExternalInternalGradientCalculator and further made possible by the long double parameter transformation functions. This is all we need for now.
As highlighted by @lmoneta in PR review comments (root-project#8369 (review)), the second derivative and step size information is not necessary at this point. The floating point exact minuit-external (roofit-internal) derivative duplication is currently done using ExternalInternalGradientCalculator and further made possible by the long double parameter transformation functions. This is all we need for now.
Starting build on |
Build failed on ROOT-ubuntu16/nortcxxmod. Errors:
And 11 more |
Build failed on mac1014/python3. Errors:
And 11 more |
Fixes: some merge mistakes + some remaining layout diffs. The additional reverted thing is the block from line 119 in MnSeedGenerator.cxx.
Build failed on ROOT-performance-centos8-multicore/default. Errors:
And 10 more |
Starting build on |
1 similar comment
Starting build on |
Just trying to keep the PR diff minimal.
5477d93
to
025f0ed
Compare
Starting build on |
@lmoneta I have removed all the second derivative and step size code. As you suspected, it was all unnecessary. Looking back at my logs, it seems I just forgot to clean them up after introducing ExternalInternalGradientCalculator. This, together with the I also took care of the copyright lines and fixed some additional layout issues. Ready for a final review! |
Build failed on ROOT-performance-centos8-multicore/default. Failing tests:
|
The JupyROOT failure is a master branch failure unrelated to this PR. Other than that, CI passes now. I think it should be ready to merge if you have no further change requests @lmoneta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for applying the changes I requested earlier.
I have requested a change in Function, but I can do it later if you prefer.
For the ExternalInternalGradientCalculator class name, can we find a better name ?
Otherwise we can discuss it later..
@@ -360,7 +360,7 @@ namespace ROOT { | |||
Gradient(x, df); | |||
} | |||
|
|||
|
|||
virtual bool returnsInMinuit2ParameterSpace() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function really needed ? It should not be at the gradient calculator level to know if the internal parameter space is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, I would add maybe a new flag to the
ROOT::Math::Minimiser::SetFunction( gradFunc & f, bool= false)
In this way the Minuit2Minimizer can set this flag in the FCNGradientBase class when creating it from the function interface passed by the clients (e.g. RooMinimizer).
If you prefer, we can merge with this and I can do this change later, please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion makes sense, indeed it feels weird that a Minuit detail is in mathcore. In light of time constraints, I would prefer to keep it like this for now, if you don't mind too much. The problem is that returnsInMinuit2ParameterSpace is also used in other places right now, in particular in the TestStatistics::MinuitFcnGrad
class that will be introduced in the next PR I will make.
/********************************************************************** | ||
* * | ||
* Copyright (c) 2005 LCG ROOT Math team, CERN/PH-SFT * | ||
* Copyright (c) 2017 Patrick Bos, Netherlands eScience Center * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to keep this class name instead of calling something else ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to suggestions, I don't love it either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe InternalSpaceGradientCalculator
? Or MinuitSpaceGradientCalculator
? Or TransformedSpaceGradientCalculator
? I find it hard to make it more compact, while at the same time clearly indicating the relation to the other gradient calculators.
Perhaps as an alternative it would make sense to make a namespace GradientCalculator
in which the different types could be gathered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To at least make the use case clearer (and perhaps also the name), I added some doxygen documentation for the class. Hope this helps.
bool CheckGradient() const { return false; } | ||
bool CheckGradient() const override { return false; } | ||
|
||
GradientParameterSpace gradParameterSpace() const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I would add in this class the flag to distinguish external from internal gradient parameter space
Starting build on |
Starting build on |
Build failed on ROOT-debian10-i386/cxx14. Failing tests: |
long double Int2ext(long double Value, long double Upper, long double Lower) const; | ||
long double Ext2int(long double Value, long double Upper, long double Lower, const MnMachinePrecision &) const; | ||
long double DInt2Ext(long double Value, long double Upper, long double Lower) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be additional overloads, leaving the perf of the double
version? Or have them as templates, only offering double
and long double
specializations (and possibly float
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this, and concluded that it's unlikely to cause issues. The two potential problems would be:
- On some more exotic platform the long doubles may not be implemented or differently and the (bit-wise exact) tests will not pass. However, it is unclear what platforms this would be. GPU computation, for instance, would not be affected, because the parameter transformations are not part of the main calculation that would be offloaded to the GPU.
- This also means performance is not expected to be a big issue. With big fits, the parameter transformations are a minute part of the Minuit run time. Most time by far is spent in the function calls. The only situation in which it could be an issue is if you have a very cheap function with a huge amount of parameters. This seems quite unlikely for realistic fits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could also argue that a pro of using long double
is that it clearly signals that these trigonometric transformations can lose some bits. This happens when you use the double
versions and transform back and forth. In some situations, you can then lose 1 or 2 bits, just because of the trigonometric implementations. An alternative solution would be more precise sine etc functions. Though, actually, I'm not sure if that helps, because the needed information may be in the extra bits that will not be stored for any double
version...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
and the (bit-wise exact) tests will not pass
That's curious - how do you "survive" on Windows? https://docs.microsoft.com/en-us/cpp/cpp/fundamental-types-cpp?view=msvc-160#floating-point-types
Which tests, by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows will use the double version and it will then use few bits when using the external calculator with the multiprocess. Will the multiprocess be available on Windows ? Probably not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - but if there's only a long double
version then which "double version" are you referring to? "Bit-wise exact" still depends on implementation / architecture. I trust this is handled, just making sure you're aware. (And I still don't know where these tests are.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests in #8596 for instance @Axel-Naumann :) Btw, they don't pass currently either, due to changes in master that I hadn't incorporated yet. On the todo list.
But good question about Windows. I haven't tested at all there yet. So probably, at least for the time being, it will have to be disabled on Windows, as @lmoneta says. In principle, ZeroMQ works on all platforms, so that is not the bottleneck. However, I have no idea how/whether forking works on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so you don't have to dig into the code: the test compares the results of fitting with RooMinimizerFcn vs fitting the same function with RooGradMinimizerFcn. We put a lot of effort into making those two fits run exactly the same way. Small bit-wise differences quickly accumulate in the algorithm to end up with different results (past the desired precision, so e.g. in the 6th decimal if you wanted the fit to converge to 5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have clarified all issues and we are ready to merge this one.
Thank you Patrick for all this !
…ges (root-project#8369) * add virtuals to IFunction to calculate second derivative and stepsize Used in Minuit2 ExternalInternalGradientCalculator (introduced in a later commit) and in RooFit (also later commits) to enable calculation of all derivative and related numbers outside of Minuit2. Specifically, having second derivative and step size calculated outside of Minuit2 allows exact replication of the gradients as calculated in Minuit2 itself, which in turn enables external optimization and parallelization of this process without having to modify Minuit2. * add second derivative and step size calculation to Minuit2 function adaptors Like the changes to IFunction in the previous commit, this is also necessary for Minuit2-external gradient calculation. * add second derivative and step size to Minuit2 parameter transformers and increase their precision This commit changes two (functionally speaking strictly separate, but unified in their ultimate purpose) things. First, to further accomodate the added ability of external gradient calculators to pass in second derivative and step sizes, the internal to external parameter transformers in Minuit2 need to have added D2Int2Ext and GStepInt2Ext functions that perform these transformations for those added parts. Second, we increase the precision of the transformers to long double. This is necessary to make sure that trigonometric transformations don't lose a few bits of precision on back-and-forth parameter transformations, which they do when using the double versions. Both changes are necessary for the external gradient calculation in Minuit2 which will be introduced in a next commit in ExternalInternalGradientCalculator. * implement optional second derivative and step size calculation in AnalyticalGradientCalculator * remove unnecessary temporary objects * add ExternalInternalGradientCalculator to minuit2 Derived class from AnalyticalGradientCalculator. Allows Minuit2 using codes to calculate gradients themselves, but in Minuit2 "internal" parameter space, in order to preserve exact value correspondence with the Minuit2 implementation. * Update math/mathcore/inc/Math/IFunction.h Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch> * revert unnecessary math/minuit2 changes As highlighted by @lmoneta in PR review comments (root-project#8369 (review)), the second derivative and step size information is not necessary at this point. The floating point exact minuit-external (roofit-internal) derivative duplication is currently done using ExternalInternalGradientCalculator and further made possible by the long double parameter transformation functions. This is all we need for now. * fix previous revert commit + revert one additional thing Fixes: some merge mistakes + some remaining layout diffs. The additional reverted thing is the block from line 119 in MnSeedGenerator.cxx. * fix additional layout Just trying to keep the PR diff minimal. * remove copyright line [skip ci] * add ExternalInternalGradientCalculator doxygen Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
This PR is the first part of a split and clean-up of #8294, as suggested by @guitargeek.
In this PR, only the changes to mathcore and Minuit2 are taken into account. These changes will be necessary for the later RooFit parts which will come in separate PRs.
Specifically, this PR introduces the ExternalInternalGradientCalculator in Minuit2. This subclass of AnalyticalGradientCalculator enables calculation of the derivative outside of Minuit2, but in "Minuit-internal" parameter space. This allows exact replication of the gradients as calculated in Minuit2 itself, which in turn enables external optimization and parallelization of this process without having to modify Minuit2, without having to worry about differences in outcome. In other words, the resulting external gradient calculation can be easily unit tested against the existing Minuit2 gradient calculation.
Note: still working on "cherry picking" all changes and organizing them into functionally sensible commits. Will un-draft the PR once this is done.Ready for review.