Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify jet constituent access in MiniAOD #22914

Merged
merged 8 commits into from
Apr 21, 2018

Conversation

ahinzmann
Copy link
Contributor

The slimmedJetsAK8 in MiniAOD are produced with the JetSubstructurePacker.cc. To save space the jet daughters are used to store both subjets and jet constituents. In order to compute e.g. jet ID or substructure observables, the jet constituents are needed, which can currently be obtained by collecting the constituents of all subjet daughters of the jet plus the non-subjet daughters of the jet.
To make this much simpler for the user, propose to adapt the pat jet interface to perform this operation automatically.
The user can the consistently obtain subjets via the member function subjets() and constituent particles via the member function daughter().
This does not change any content of RECO or MiniAOD, it only changes the interface to jets with subjets.
@arizzi @gpetruc @rappoccio Please have a look.
Suggest to backport this to all analysis releases if agreed.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @ahinzmann for master.

It involves the following packages:

DataFormats/PatCandidates

@perrotta, @monttj, @cmsbuild, @slava77, @gpetruc, @arizzi can you please review it and eventually sign? Thanks.
@gpetruc, @gouskos, @rovere, @cbernet this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@arizzi
Copy link
Contributor

arizzi commented Apr 10, 2018

while this makes sense I wonder if we have some troubles anywhere on user code relying on current behavior.

For example something like this:
https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/python/jets_cff.py#L259-L262
could be broken if done on daughter(i) instead of sourceCandidatePtr(i)

const std::vector<reco::CandidatePtr> & jdaus = daughterPtrVector();
for (const reco::CandidatePtr & dau : jdaus) {
if (dau->isJet()) {
const std::vector<reco::CandidatePtr> & sjdaus = edm::Ptr<pat::Jet>(dau)->daughterPtrVector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer

  const pat::Jet *subjet = dynamic_cast<const pat::Jet *>(&*dau);
  if (subjet) { ... 

or perhaps with reco::Jet instead of pat::Jet?
there's no guarantee that isJet() will return true only a pat::Jet object (and there may be overheads in doing the cast using edm::Ptr, which is unnecessary since you're not writing it to a file)

@ahinzmann
Copy link
Contributor Author

@arizzi This should only affect code which is trying to find subjets via daughter() rather than subjets(). Since multiple subjet collections are stored, access via daughter() is anyways unsafe. That's why it may be acceptable to change behavior for this particular case.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #22914 was updated. @perrotta, @monttj, @cmsbuild, @slava77, @gpetruc, @arizzi can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Apr 10, 2018

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 10, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27420/console Started: 2018/04/10 17:43

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #22914 was updated. @perrotta, @monttj, @cmsbuild, @slava77, @gpetruc, @arizzi can you please check and sign again.

@perrotta
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 19, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/27562/console Started: 2018/04/19 09:28

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-22914/27562/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 29
  • DQMHistoTests: Total histograms compared: 2492830
  • DQMHistoTests: Total failures: 27
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2492627
  • DQMHistoTests: Total skipped: 176
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 23 files compared)
  • Checked 119 log files, 9 edm output root files, 29 DQM output files

@slava77
Copy link
Contributor

slava77 commented Apr 19, 2018

+1

for #22914 9b04b8b

  • changes are in line with the PR description and the follow up review
    • Note that the persisted data is the same, which means that all workflows (like nanoAOD) which analyze stored jets should be showing their final expected behavior (there is no need to produce new files to test on top of them).
  • jenkins tests pass and comparisons with the baseline show differences only in several AK8 plots expected to behave differently
    e.g. workflow 136.8311

wf136 8311_ak8_constituents

the number of constituents is now returning the count of underlying constituents

wf136 8311_ak8_neutfrac

this plot is now fixed: previously it contained values above 1 due to previously incorrect in this context normalization.
  • in local tests, the behavior is as expected
    • direct scan Events->Scan("patJets_slimmedJetsAK8__PAT.obj.numberOfDaughters():patJets_slimmedJetsAK8__PAT.obj[].dau@.size()", "patJets_slimmedJetsAK8__PAT.obj.pt()>=170") shows an increase in numberOfDaughters counts, which now can be different from the dau vector size
    • Events->Scan("patJets_slimmedJetsAK8__PAT.obj[].print().c_str()", "patJets_slimmedJetsAK8__PAT.obj.pt()>=170", "colsize=1024") now prints the expected underlying constituents. Given the implementation of reco::Jet::print , which uses daughterPtr, this is an easy check to see that loops restricted by the numberOfDaughters should behave OK. Perhaps the reco::Jet::print method should actually be overloaded in pat::Jet to print a more detailed info.

@slava77
Copy link
Contributor

slava77 commented Apr 19, 2018

@ahinzmann
related to backports.
Can we survive without a similar update for 10_1_X?

@slava77
Copy link
Contributor

slava77 commented Apr 20, 2018

@fabiocos
please check this PR.
Thank you.

@fabiocos
Copy link
Contributor

@slava77 I understand from the review thread that the result is now backward compatible with the previous behaviour, and for 10_2_X this new way of accessing constituents should be ok.

@fabiocos
Copy link
Contributor

+1

@perrotta
Copy link
Contributor

@fabiocos : an explict "merge" is needed

@fabiocos
Copy link
Contributor

merge

as usual I often forget it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants