Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding parameters needed for PUjetID 94X and 102X #28827

Closed
wants to merge 4 commits into from

Conversation

alefisico
Copy link
Contributor

@alefisico alefisico commented Jan 30, 2020

PR description:

PR validation:

Successfully run the runTheMatrix tests and here is the output: https://cernbox.cern.ch/index.php/s/XNTEPDN5VU01O84

Before submitting your pull requests, make sure you followed this checklist:

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28827/13548

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @alefisico (Alejandro Gomez Espinosa) for master.

It involves the following packages:

RecoJets/JetProducers

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@rappoccio, @jdolen, @ahinzmann, @jdamgov, @yslai, @nhanvtran, @gkasieczka, @clelange, @schoef, @mariadalfonso, @seemasharmafnal this is something you requested to watch as well.
@davidlange6, @silviodonato, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28827/13551

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

Pull request #28827 was updated. @perrotta, @cmsbuild, @slava77 can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Jan 30, 2020

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 30, 2020

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/4431/console Started: 2020/01/30 14:57

@cmsbuild
Copy link
Contributor

+1
Tested at: 086826c
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-567459/4431/summary.html
CMSSW: CMSSW_11_1_X_2020-01-30-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-28827/13636

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

Pull request #28827 was updated. @perrotta, @cmsbuild, @slava77 can you please check and sign again.

@perrotta
Copy link
Contributor

perrotta commented Feb 5, 2020

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-data/RecoJets-JetProducers#10
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/4508/console Started: 2020/02/05 10:54

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

+1
Tested at: 6af43a3
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ca6d/4508/summary.html
CMSSW: CMSSW_11_1_X_2020-02-04-2300
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

Comparison job queued.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 5, 2020

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-35ca6d/4508/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 45 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2698043
  • DQMHistoTests: Total failures: 1261
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2696436
  • DQMHistoTests: Total skipped: 346
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@ahinzmann
Copy link
Contributor

@alefisico In these tests, the discriminator seems to always peak at -1, regardless of how much pileup is in the sample, e.g.
https://cmssdt.cern.ch/SDT/jenkins-artifacts/baseLineComparisons/CMSSW_11_1_X_2020-02-04-2300+35ca6d/34963/validateJR/all_mini_OldVSNew_RunEGamma2018Cwf136p874/all_mini_OldVSNew_RunEGamma2018Cwf136p874c_patJets_slimmedJets__reRECO_obj___userFloats__3_.png

Was this also the case in the tests done in 10_2_X? It looks fishy to me.

@camclean
Copy link
Contributor

camclean commented Feb 7, 2020

Did the code change so that jets with pt > 50 GeV return -1?

@perrotta
Copy link
Contributor

Is there any update on the explanation of the outputs?

#4 Eta Categories 0-2.5 2.5-2.75 2.75-3.0 3.0-5.0

#Tight Id
Pt010_Tight = cms.vdouble( 0.69, -0.35, -0.26, -0.21),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the values appear to be the same.
Do we really need to copy-paste this fairly large number of values?
Simply full_94x_chs_wp = full_102x_chs_wp.clone() would work.
Same comment applies to the other points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers are preliminary the same, but it will potentially change in the near future. (and could not be the same for all the years like now) Perhaps is better to include the structure now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back, indeed the values of the working points are rather different for different training cases.
What are the targets in training?

If the development strategy is to derive the working points from some fixed numerical target (like x% efficiency or rejection), then there would be very likely no accidental agreement. This, however doesn't seem to be the case this time.

If the WPs are weakly defined [this time?] with a replacement based on judgement calls,
it would be OK to copy-paste if the older values can update independently.

However it sounds like the development strategy this time is to first introduce placeholders to be replaced after some later processing.
[I'm judging from the history of the past updates and guessing from the symptoms of potentially broken WPs visible in the validation results].
If you feel like a shorter new_wp = old_wp.clone() #TEMPORARY is less acceptable,
please at least add an explicit comment inline that these are to be replaced soon.

tmvaWeights = cms.FileInPath("RecoJets/JetProducers/data/pileupJetId_102X_Eta2p5To2p75_chs_BDT.weights.xml.gz"),
tmvaVariables = cms.vstring(
"nvtx" ,
"dR2Mean" ,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the same variable name vector repeats for the first 3 ranges. These can be initialized from a separately defined (temporary) vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I will change it


###################################################################################################################
full_94x_chs = cms.PSet(
impactParTkThreshold = cms.double(1.),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is anything changing here other than the training file names and the working point PSet?
If not, a more compact and perhaps more maintainable way would be to use

full_94x_chs = full_102x_chs.clone(JetIdParams = full_94x_chs_wp)
full_94x_chs.trainings[0].tmvaWeights = "RecoJets/JetProducers/data/pileupJetId_94X_Eta0p0To2p5_chs_BDT.weights.xml.gz"
...

Having the older IDs listed first may also be more maintainable for cases of future additions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment. We are currently using the same values for WP in these years, but they will potentially change in the near future. What is your suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full_94x_chs = full_102x_chs.clone(JetIdParams = full_94x_chs_wp) is applying different working points for 94X vs 102X. The suggestion in the earlier comment simply allows not to repeat what's known to be the same.

@alefisico
Copy link
Contributor Author

Hi all,
we needed to change a couple of things after the bug that @ahinzmann found. Since now several things changed in CMSSW, I think it is better to close this PR and start clean.
New PR is #28984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants