Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PV sumPT2 of pf charged particles associated to PV in NanoAOD #43487

Merged
merged 5 commits into from
Jan 12, 2024

Conversation

JunquanTao
Copy link
Contributor

PR description:

This PR is to add the PV sumPT2 of pf charged particles associated to PV in NanoAOD.

The motivations and related studies ae summarized in: https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

"SumPT2" of the pf charged candidates associated to the chosen PV was used in Run2 low-mass Hgamgam analysis (HIG-20-002) to suppress the DY bkg efficiently. In current nanoAOD, “PV_score” which is the sum pt2 of clustered objects, shows worse performance (~7% relatively more DY events will surviving the selection) than “SumPT2”. So, we propose to add the PV “SumPT2” in the nanoAOD.

Since only 1 variable (float type) for each event is proposed to add, the expected additional size in nanoAOD will be 1 to 2 bytes per event, depending on the precision to store this variable (8 or 10 bits). By default we propose to store it in the precision with 10 bits.

PR validation:

runTheMatrix tests have been successfully run for the following workflow :
• runTheMatrix.py -l 12434.0

Backport:

We target this to go in Nano V13 for now. Backport might be required if the Nano production happens in the earlier CMSSW release.

Hgg convenors @lfinco , @fcouderc

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 4, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43487/38060

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 4, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43487/38061

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@Prasant1993
Copy link
Contributor

Hi @JunquanTao,

You need to adapt the code format as cmsbuild suggested in the above. Try to run this,

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 4, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43487/38062

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 4, 2023

A new Pull Request was created by @JunquanTao (JunquanTao) for master.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)

@vlimant, @cmsbuild, @simonepigazzini can you please review it and eventually sign? Thanks.
@gpetruc, @AnnikaStein this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@vlimant
Copy link
Contributor

vlimant commented Dec 8, 2023

enable nano

@vlimant
Copy link
Contributor

vlimant commented Dec 8, 2023

please test

@mbluj
Copy link
Contributor

mbluj commented Dec 11, 2023

@JunquanTao I wonder why you associate candidates to PV using dz instead of vertexRef() (e.g. like this vertexRef().key() == 0 which works w/o constructing actual ref for PV)? It should be more precise. To tighten selection you might use pvAssociationQuality(). Those are documented at https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2017#PV_Assignment

@JunquanTao
Copy link
Contributor Author

@JunquanTao I wonder why you associate candidates to PV using dz instead of vertexRef() (e.g. like this vertexRef().key() == 0 which works w/o constructing actual ref for PV)? It should be more precise. To tighten selection you might use pvAssociationQuality(). Those are documented at https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2017#PV_Assignment

@mbluj As mentioned in the slides [1] (S3) when submitting the PR, we just employed the same method as what we used in Run2 analyses (for both 125 GeV Hgg and low-mass Hgg analyses), with the detailed codes in [2]. From the test, we could reproduce the log values of SumPT2 as what we used in Run2 analysis with flashgg framework. As shown in the bottom plots on S5 [1], with the variable stored in customized nanoAOD with a precision of 10 bits, the difference compared to the ones in flashgg, is less than ~0.01% relatively. We can test the performance with “vertexRef()” in the future. Do you have any suggestion on the quality requirement of PV-candidate association “pvAssociationQuality()”? Thanks!

[1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

[2] https://github.com/cms-analysis/flashgg/blob/dev_legacy_runII/MicroAOD/plugins/DzVertexMapProducer.cc#L54-L75 with “maxAllowedDz_=0.2” as specified in https://github.com/cms-analysis/flashgg/blob/dev_legacy_runII/MicroAOD/python/flashggTkVtxMap_cfi.py#L6

@mbluj
Copy link
Contributor

mbluj commented Dec 11, 2023

@JunquanTao OK, I see. Concerning selecting by association quality you can try two options:

  • "tight" with quality >= CompatibilityDz, i.e. using candidates with tracks used to fit the PV or with |dz|<0.1cm, significance of dz < 5 and dz error < 0.05cm,
  • "loose" with quality >= CompatibilityBTag, as above or within a jet compatible with the PV.
    Finally, you can require that VertexRef is PV and |dz|< 0.2 which should reproduce (or be very close to) the current selection without need of looping over vertices.

@JunquanTao
Copy link
Contributor Author

@JunquanTao OK, I see. Concerning selecting by association quality you can try two options:

  • "tight" with quality >= CompatibilityDz, i.e. using candidates with tracks used to fit the PV or with |dz|<0.1cm, significance of dz < 5 and dz error < 0.05cm,
  • "loose" with quality >= CompatibilityBTag, as above or within a jet compatible with the PV.
    Finally, you can require that VertexRef is PV and |dz|< 0.2 which should reproduce (or be very close to) the current selection without need of looping over vertices.

@mbluj There should be some misunderstanding here.

We do use the PV 0 directly to calculate dz [1]. We loop the rest PVs to see if this pf charged candidate belongs to other PVs (loop starting from 1 [2]), by checking its distance to any other PV “newdz” is less than “dz” or not. So, we need looping the rest vertices. “vertexRef()” with some quality requirement will not save the time.

“VertexRef is PV and |dz|< 0.2” is not enough. As mentioned above, we need to make sure this pf charged candidate does not belong to other PVs, then we count it in the sum PT2.

So, the current codes is good enough to calculate the variable “SumPT2’. Right?

[1]

double dz = fabs(obj.dz((*pvsIn)[0].position()));

[2]

for (size_t j = 1; j < (*pvsIn).size(); j++) {

@mbluj
Copy link
Contributor

mbluj commented Dec 11, 2023

@JunquanTao OK, I see. Concerning selecting by association quality you can try two options:

  • "tight" with quality >= CompatibilityDz, i.e. using candidates with tracks used to fit the PV or with |dz|<0.1cm, significance of dz < 5 and dz error < 0.05cm,
  • "loose" with quality >= CompatibilityBTag, as above or within a jet compatible with the PV.
    Finally, you can require that VertexRef is PV and |dz|< 0.2 which should reproduce (or be very close to) the current selection without need of looping over vertices.

@mbluj There should be some misunderstanding here.

We do use the PV 0 directly to calculate dz [1]. We loop the rest PVs to see if this pf charged candidate belongs to other PVs (loop starting from 1 [2]), by checking its distance to any other PV “newdz” is less than “dz” or not. So, we need looping the rest vertices. “vertexRef()” with some quality requirement will not save the time.

“VertexRef is PV and |dz|< 0.2” is not enough. As mentioned above, we need to make sure this pf charged candidate does not belong to other PVs, then we count it in the sum PT2.

So, the current codes is good enough to calculate the variable “SumPT2’. Right?

[1]

double dz = fabs(obj.dz((*pvsIn)[0].position()));

[2]

for (size_t j = 1; j < (*pvsIn).size(); j++) {

I understand logic of your code and I agree that is enough to reach your goal of reproducing sumPt2 score. But, iterating over candidates and summing pt2 of those which have vertexRef().key == 0 (i.e. are associated to PV=vertices[0]) with requirement that |dz(vertices[0].position())|<0.2 (to assure that dz wrt PV is always smaller than 0.2) will give very similar result. It is because of the way in which packedPFcandidates are associated to vertices. I do not force you to change your logic now, instead I want to advocate for checking (and using in future) the vertex association present in packedPFCandidates. I believe it is more robust than what is present in current code, e.g. because one can use quality flags and do not need loop over vertices to reject candidates closer in dz to other vertex than PV.
You can try something like this:

for (const auto& obj : *pfcIn) {
    if (obj.charge() != 0 && obj.vertexRef().key ==0 && fabs(obj.dz((*pvsIn)[0].position())) < 0.2)
      pv_sumpt2 +=  std::pow(obj.pt(),2);
}

@JunquanTao
Copy link
Contributor Author

@JunquanTao OK, I see. Concerning selecting by association quality you can try two options:

  • "tight" with quality >= CompatibilityDz, i.e. using candidates with tracks used to fit the PV or with |dz|<0.1cm, significance of dz < 5 and dz error < 0.05cm,
  • "loose" with quality >= CompatibilityBTag, as above or within a jet compatible with the PV.
    Finally, you can require that VertexRef is PV and |dz|< 0.2 which should reproduce (or be very close to) the current selection without need of looping over vertices.

@mbluj There should be some misunderstanding here.
We do use the PV 0 directly to calculate dz [1]. We loop the rest PVs to see if this pf charged candidate belongs to other PVs (loop starting from 1 [2]), by checking its distance to any other PV “newdz” is less than “dz” or not. So, we need looping the rest vertices. “vertexRef()” with some quality requirement will not save the time.
“VertexRef is PV and |dz|< 0.2” is not enough. As mentioned above, we need to make sure this pf charged candidate does not belong to other PVs, then we count it in the sum PT2.
So, the current codes is good enough to calculate the variable “SumPT2’. Right?
[1]

double dz = fabs(obj.dz((*pvsIn)[0].position()));

[2]

for (size_t j = 1; j < (*pvsIn).size(); j++) {

I understand logic of your code and I agree that is enough to reach your goal of reproducing sumPt2 score. But, iterating over candidates and summing pt2 of those which have vertexRef().key == 0 (i.e. are associated to PV=vertices[0]) with requirement that |dz(vertices[0].position())|<0.2 (to assure that dz wrt PV is always smaller than 0.2) will give very similar result. It is because of the way in which packedPFcandidates are associated to vertices. I do not force you to change your logic now, instead I want to advocate for checking (and using in future) the vertex association present in packedPFCandidates. I believe it is more robust than what is present in current code, e.g. because one can use quality flags and do not need loop over vertices to reject candidates closer in dz to other vertex than PV. You can try something like this:

for (const auto& obj : *pfcIn) {
    if (obj.charge() != 0 && obj.vertexRef().key ==0 && fabs(obj.dz((*pvsIn)[0].position())) < 0.2)
      pv_sumpt2 +=  std::pow(obj.pt(),2);
}

Hi @mbluj , I tested the codes as you proposed. As summarized on the last slide (S7) of [1], I saw different SumPT2 values from the codes you proposed compared to the codes we proposed and used in Run2 analysis, in ~2.5k events with ~5.5 total signal events tested. Based on your codes, most of the events in these ~2.5k events have large SumPT2. So I guess, the requirement in the codes as you proposed, do not guarantee that each charged pf candidate belongs to at most 1 PV. The charged pf candidates could belong to 2 or more PVs if its dz less than 0.2 wrt these PVs. So we may count the charged pf candidate, which has smaller dz to other PV compared to the dz to PV0, in this sumPT2. Then we get larger SumPT2, from the codes you proposed. I am not sure if my understanding is correct or not. Can you please clarify if each charged pf candidate belongs to at most 1 PV, or not? If not, we need to loop the rest PVs, in order to not account the charged pf candidate if its dz to other PV smaller than its dz to PV 0. This is what we did in Run2 analysis, associating a charged pf candidate to the closest vertex only. Thanks, Junquan

[1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

@JunquanTao
Copy link
Contributor Author

@mbluj Have you checked my previous comment and the extra studies on slide 7 [1]? Any comments or proposal? Thanks, Junquan

[1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

@mbluj
Copy link
Contributor

mbluj commented Dec 20, 2023

@mbluj Have you checked my previous comment and the extra studies on slide 7 [1]? Any comments or proposal? Thanks, Junquan

[1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

Sorry for belated answer. Thank you for the tests.
The way in which charged particles are associated to PV should be unique, so there should not be double counting. But, it could be that in some cases the track is associated to other PV than to the one giving smallest dz.
If you can please try also:

float pv_sumpt2 = 0.0;
for (const auto& obj : *pfcIn) {
    if (obj.charge() != 0 && obj.vertexRef().key == 0 && obj.pvAssociationQuality() >= pat::PackedCandidate::CompatibilityDz && fabs(obj.dz((*pvsIn)[0].position())) < 0.2)
      pv_sumpt2 +=  std::pow(obj.pt(),2);
}

If is does not help please use your algorithm - it is not granted that you will recover old pt2 using different (in my opinion better) track to PV association. You can also consider to break backward compatibility and use new definition of pt2... Finally, you can also consider to switch from pt2 to PV score which is already stored in nanoAOD...

@JunquanTao
Copy link
Contributor Author

@mbluj Have you checked my previous comment and the extra studies on slide 7 [1]? Any comments or proposal? Thanks, Junquan
[1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

Sorry for belated answer. Thank you for the tests. The way in which charged particles are associated to PV should be unique, so there should not be double counting. But, it could be that in some cases the track is associated to other PV than to the one giving smallest dz. If you can please try also:

float pv_sumpt2 = 0.0;
for (const auto& obj : *pfcIn) {
    if (obj.charge() != 0 && obj.vertexRef().key == 0 && obj.pvAssociationQuality() >= pat::PackedCandidate::CompatibilityDz && fabs(obj.dz((*pvsIn)[0].position())) < 0.2)
      pv_sumpt2 +=  std::pow(obj.pt(),2);
}

If is does not help please use your algorithm - it is not granted that you will recover old pt2 using different (in my opinion better) track to PV association. You can also consider to break backward compatibility and use new definition of pt2... Finally, you can also consider to switch from pt2 to PV score which is already stored in nanoAOD...

Hi @mbluj , I did the check with the additional quality requirement “obj.pvAssociationQuality() >= pat::PackedCandidate::CompatibilityDz” as you suggested. As summarized on slide 8 [1], ~99% of 5.5k events have different ΣpT2, from the new codes (so-called version2) with the additional quality requirement compared to the old codes (as we used in Run2). For most of the events, we get smaller SumPT2 from the new codes (v2). So, we think it’s better to use the algorithm as in the PR.

Finally, you can also consider to switch from pt2 to PV score which is already stored in nanoAOD

We obtained worse performance with PV score (~7% relatively more DY events will surviving the selection) instead of SumPT2. This is the motivation or reason why we propose to add the SumPT2 in the next nanoAOD production. You can see it on slide 2 [1] or in the PR description.

[1] https://twiki.cern.ch/twiki/pub/CMS/JunquanTao/LMHgamgam_CustomizedNanoAOD_AddingSumPT2.pdf

@mbluj
Copy link
Contributor

mbluj commented Dec 22, 2023

@JunquanTao Thank you for the check. I have not more comments.

@JunquanTao
Copy link
Contributor Author

@mbluj @vlimant Happy New Year! We should move forward. Please issue the test command, in order to finish all the checks. Thanks and best, Junquan

@mbluj
Copy link
Contributor

mbluj commented Jan 9, 2024

No more comments or requests from my side, thanks!

@vlimant
Copy link
Contributor

vlimant commented Jan 10, 2024

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b92165/36771/summary.html
COMMIT: 1c4c038
CMSSW: CMSSW_14_0_X_2024-01-09-2300/el8_amd64_gcc12
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43487/36771/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 101 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 22 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3247277
  • DQMHistoTests: Total failures: 1219
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3246036
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 6.3900000000000015 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): 0.426 KiB Physics/NanoAODDQM
  • Checked 200 log files, 161 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

NANO Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 15
  • DQMHistoTests: Total histograms compared: 16405
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 16405
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 5.964000000000001 KiB( 14 files compared)
  • DQMHistoSizes: changed ( 2500.001,... ): 0.426 KiB Physics/NanoAODDQM
  • Checked 38 log files, 18 edm output root files, 15 DQM output files

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.0 2.544 2.542 0.002 ( +0.1% ) 5.23 5.29 -1.0% 2.164 2.217
2500.001 2.691 2.688 0.003 ( +0.1% ) 4.73 4.73 -0.0% 2.582 2.650
2500.002 2.629 2.625 0.004 ( +0.2% ) 4.90 4.96 -1.3% 2.588 2.627
2500.01 1.314 1.312 0.002 ( +0.1% ) 9.42 9.74 -3.2% 2.249 2.332
2500.011 1.730 1.729 0.002 ( +0.1% ) 5.14 5.33 -3.5% 2.394 2.530
2500.012 1.576 1.575 0.002 ( +0.1% ) 7.39 7.63 -3.1% 2.355 2.445
2500.1 2.192 2.190 0.002 ( +0.1% ) 5.32 5.32 +0.1% 2.074 2.078
2500.2 2.306 2.304 0.002 ( +0.1% ) 6.06 6.11 -0.8% 1.991 1.998
2500.21 1.182 1.180 0.002 ( +0.1% ) 4.31 4.37 -1.5% 2.277 2.285
2500.211 1.544 1.542 0.002 ( +0.1% ) 3.78 3.84 -1.4% 2.244 2.371
2500.3 2.061 2.059 0.002 ( +0.1% ) 12.67 12.91 -1.9% 1.905 1.976
2500.31 1.256 1.255 0.001 ( +0.1% ) 20.36 20.67 -1.5% 2.349 2.374
2500.311 1.644 1.642 0.002 ( +0.1% ) 13.10 13.80 -5.1% 2.339 2.454
2500.312 7.025 7.025 0.000 ( +0.0% ) 1.34 1.32 +1.5% 1.714 1.712
2500.313 1.471 1.471 0.000 ( +0.0% ) 6.81 6.73 +1.1% 1.044 1.040
2500.4 2.061 2.059 0.002 ( +0.1% ) 12.71 12.91 -1.6% 1.981 1.982
2500.5 19.575 19.575 0.000 ( +0.0% ) 1.08 1.19 -9.4% 1.128 1.355

@vlimant
Copy link
Contributor

vlimant commented Jan 10, 2024

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 737da2e into cms-sw:master Jan 12, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants