Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove OrderedDicts etc except where empirically necessary #940

Merged
merged 4 commits into from
Apr 26, 2024

Conversation

kcormi
Copy link
Collaborator

@kcormi kcormi commented Apr 12, 2024

It would be good to do a few more checks on this and take a closer look to make sure its what we want before merging, but this seems to preserve the creation order of all workspace objects, as discussed in #936.

@nucleosynthesis
Copy link
Contributor

@kcormi - a very concrete check would be to run text2workspace.py 125.5/comb_hbb.txt -m 125.5 (from the cms Higgs observation area) twice with -v 3 and check the output diff. If they are the same with this PR, then I think you've narrowed down the changes that were really crucial and we can merge this one

@kcormi
Copy link
Collaborator Author

kcormi commented Apr 18, 2024

@nucleosynthesis, yes good suggestion. I tested this now, and there seems to be a difference in a RooProdPdf print out which I don't understand. I seem to get a same/similar one with the current main branch though. So I'm not sure if its actually related, or if I've just made a silly mistake in trying to test it.

I'll need to investigate a little later, but I'll try to wrap this up soon so we can merge it.

@kcormi
Copy link
Collaborator Author

kcormi commented Apr 18, 2024

It seems in my tests that I still get a few lines of differences, even when using the fully ordered collections.

It's always just a few lines with the RooProdPdfs, and the difference doesn't seem to be in the pdf itself, but that in one of the two lines printing the pdf out, there is also an error of the style

[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name CMS_vhbb_stats_TT_ZmmLoose7TeV is already in this set

I'm not sure if it's really a problem, I suspect its not, but I'll try to see if I can track it down regardless.

@guitargeek
Copy link
Contributor

guitargeek commented Apr 19, 2024

Here is the stack trace for where the ERROR comes from:

Thread 1 "python" hit Breakpoint 1, RooArgSet::checkForDup (this=0x5555690d37e8, var=..., silent=<optimized out>)
    at /home/rembserj/code/root/roofit/roofitcore/src/RooArgSet.cxx:285
285	   coutE(InputArguments) << "RooArgSet::checkForDup: ERROR argument with name " << var.GetName() << " is already in this set" << endl;
(gdb) bt
#0  RooArgSet::checkForDup (this=0x5555690d37e8, var=..., silent=<optimized out>)
    at /home/rembserj/code/root/roofit/roofitcore/src/RooArgSet.cxx:285
#1  0x00007fffdda81b4c in RooArgSet::canBeAdded (this=<optimized out>, arg=..., silent=<optimized out>)
    at /home/rembserj/code/root/roofit/roofitcore/inc/RooArgSet.h:190
#2  0x00007fffddac9102 in RooAbsCollection::add (this=this@entry=0x5555690d37e8, var=...,
    silent=silent@entry=false) at /home/rembserj/code/root/roofit/roofitcore/src/RooAbsCollection.cxx:436
#3  0x00007fffd780ada4 in RooCollectionProxy<RooArgSet>::add (silent=<optimized out>,
    shapeServer=<optimized out>, valueServer=<optimized out>, var=..., this=<optimized out>)
    at /home/rembserj/spaces/master/install/include/root/RooCollectionProxy.h:192
#4  RooCollectionProxy<RooArgSet>::add (silent=<optimized out>, var=..., this=<optimized out>)
    at /home/rembserj/spaces/master/install/include/root/RooCollectionProxy.h:114
#5  SimpleCacheSentry::addVars (this=this@entry=0x5555690d34f0, vars=...)
    at /home/rembserj/HiggsAnalysis/CombinedLimit/src/SimpleCacheSentry.cc:43
#6  0x00007fffd782afce in FastVerticalInterpHistPdf2Base::initBase (this=0x5555690d3020)
    at /home/rembserj/HiggsAnalysis/CombinedLimit/src/VerticalInterpHistPdf.cc:826
#7  FastVerticalInterpHistPdf2Base::initBase (this=0x5555690d3020)
    at /home/rembserj/HiggsAnalysis/CombinedLimit/src/VerticalInterpHistPdf.cc:813
#8  0x00007fffd782b415 in FastVerticalInterpHistPdf2::evaluate (this=0x5555690d3020)
    at /home/rembserj/HiggsAnalysis/CombinedLimit/src/VerticalInterpHistPdf.cc:895

Indeed, the SimpleCacheSentry (whatever that does) uses a RooArgSet as a container, and is therefore not able to deal with different RooAbsArg instances that have the same name:
https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/src/SimpleCacheSentry.cc#L32

There is a check if (_deps.containsInstance(*a)) continue; to make sure the same instance is not added twice, but it seems this FastVerticalInterpHistPdf2 uses a list of coefficient, where some coefficients have the same name but are different RooAbsArgs. This might or might not be a bigger problem: RooFit is inconsistent in treating different instances with the same name. For example in a RooWorkspace, there can't be different variables with the same name, everything is unique. But when objects are then copied when doing the NLL fit, there might be independent copies with the same name, depending on what the framework does. This situation should be avoided, because in the worst case these independent instances might get out of sync in value, even though they were mathematically intended to be the same.

This is the reason why RooFit models are often magically fixed by writing the workspace to a file and reading it back: like this, all variables with the same name are de-duplicated, which is usually the intent of the model builder.

So in this case, the ERROR is probably not an issue because you are importing the model to workspace anyway. While doing that, all variables are deduplicated.

guitargeek added a commit to guitargeek/HiggsAnalysis-CombinedLimit that referenced this pull request Apr 19, 2024
The `_sentry` is already filled when calling `evaluate()`, so no need to
do that in the copy constructor. Doing so can actually leading to
redundant calls to `_sentry.addVars(_coefList)`, resulting in the
problem reported here:
cms-analysis#940 (comment)
```
[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name CMS_vhbb_stats_TT_ZmmLoose7TeV is already in this set
```
@guitargeek
Copy link
Contributor

Confirmed, the errors were harmless: we just tried to fill a collection with the same set of objects twice, and the second time you get an error instead of adding more elements. Can be avoided: #952

guitargeek added a commit to guitargeek/HiggsAnalysis-CombinedLimit that referenced this pull request Apr 23, 2024
The `_sentry` is already filled when calling `evaluate()`, so no need to
do that in the copy constructor. Doing so can actually leading to
redundant calls to `_sentry.addVars(_coefList)`, resulting in the
problem reported here:
cms-analysis#940 (comment)
```
[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name CMS_vhbb_stats_TT_ZmmLoose7TeV is already in this set
```
@kcormi kcormi changed the title Draft: Remove OrderedDicts etc except where empirically necessary Remove OrderedDicts etc except where empirically necessary Apr 26, 2024
@kcormi
Copy link
Collaborator Author

kcormi commented Apr 26, 2024

Thanks @guitargeek -- I will merge this one. I need to double check #952 a little before merging it, but looks fine.

@kcormi kcormi merged commit f23b469 into cms-analysis:main Apr 26, 2024
7 checks passed
guitargeek added a commit to guitargeek/HiggsAnalysis-CombinedLimit that referenced this pull request Jun 4, 2024
The `_sentry` is already filled when calling `evaluate()`, so no need to
do that in the copy constructor. Doing so can actually leading to
redundant calls to `_sentry.addVars(_coefList)`, resulting in the
problem reported here:
cms-analysis#940 (comment)
```
[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name CMS_vhbb_stats_TT_ZmmLoose7TeV is already in this set
```
kcormi pushed a commit to kcormi/HiggsAnalysis-CombinedLimit that referenced this pull request Jun 24, 2024
The `_sentry` is already filled when calling `evaluate()`, so no need to
do that in the copy constructor. Doing so can actually leading to
redundant calls to `_sentry.addVars(_coefList)`, resulting in the
problem reported here:
cms-analysis#940 (comment)
```
[#0] ERROR:InputArguments -- RooArgSet::checkForDup: ERROR argument with name CMS_vhbb_stats_TT_ZmmLoose7TeV is already in this set
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants