Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275

OzAmram · 2021-09-14T19:44:51Z

This is a small fix to address the undefined behavior reported in issue #35036. The change is just to check the range is sensible before initializing an array.

No changes in output are expected.

@mmusich @ferencek @tsusa @tvami @czangela

cmsbuild · 2021-09-14T19:51:50Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35275/25270

This PR adds an extra 20KB to repository

cmsbuild · 2021-09-14T19:52:09Z

A new Pull Request was created by @OzAmram (Oz Amram) for master.

It involves the following packages:

RecoLocalTracker/SiPixelClusterizer (reconstruction)

@jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@mtosi, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @OzAmram, @ferencek, @dkotlins, @gpetruc, @mmusich, @threus, @tvami this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

tvami · 2021-09-14T20:01:36Z

@cmsbuild , please test

tvami · 2021-09-14T20:01:44Z

type bug-fix

cmsbuild · 2021-09-14T23:04:51Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/18609/summary.html
COMMIT: ffdea77
CMSSW: CMSSW_12_1_X_2021-09-14-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35275/18609/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 39
DQMHistoTests: Total histograms compared: 3000833
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3000805
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 38 files compared)
Checked 165 log files, 37 edm output root files, 39 DQM output files
TriggerResults: no differences found

mmusich · 2021-09-15T06:56:27Z

please test workflow 134.706 for CMSSW_12_1_UBSAN_X

cmsbuild · 2021-09-15T15:15:14Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/18617/summary.html
COMMIT: ffdea77
CMSSW: CMSSW_12_1_UBSAN_X_2021-09-13-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35275/18617/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

The workflows 140.53 have different files in step1_dasquery.log than the ones found in the baseline. You may want to check and retrigger the tests if necessary. You can check it in the "files" directory in the results of the comparisons

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-110411/134.706_RunMuonEG2015B+RunMuonEG2015B+HLTDR2_50ns+RECODR2_50nsreHLT_HIPM+HARVESTDR2

Summary:

You potentially added 21172 lines to the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 63006 differences found in the comparisons
DQMHistoTests: Total files compared: 39
DQMHistoTests: Total histograms compared: 3001001
DQMHistoTests: Total failures: 396848
DQMHistoTests: Total nulls: 38
DQMHistoTests: Total successes: 2604093
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -45.455 KiB( 38 files compared)
DQMHistoSizes: changed ( 136.731,... ): 0.004 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 140.53 ): -44.531 KiB Hcal/DigiRunHarvesting
DQMHistoSizes: changed ( 140.53 ): -1.172 KiB RPC/DCSInfo
DQMHistoSizes: changed ( 250202.181 ): -0.064 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 25202.0 ): 0.308 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 165 log files, 37 edm output root files, 39 DQM output files
TriggerResults: found differences in 13 / 38 workflows

jpata · 2021-09-16T13:54:49Z

RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc

@@ -222,6 +222,11 @@ void PixelThresholdClusterizer::copy_to_buffer(DigiIterator begin, DigiIterator
    // std::cout << (doMissCalibrate ? "VI from db" : "VI linear") << std::endl;
  }
 #endif
+
+  //avoid undefined behavior
+  if (end <= begin)


wouldn't it be better to essentially assert/crash here, and ensure that copy_to_buffer is not called with incorrect inputs (by fixing the calling code)?

I don't personally have a strong preference on this. But I am not an expert in the clusterizer code so someone else would have to try to understand why copy_to_buffer is being called with these incorrect inputs and come up with a fix (maybe @ferencek or @czangela ?)

No strong preference here either but calling copy_to_buffer in https://github.com/cms-sw/cmssw/blob/CMSSW_12_1_0_pre3/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#L151 only if end>begin perhaps would be a more elegant solution.

On the other hand, I think it makes sense for a function itself to act in case it encounters an undefined behavior rather than to leave the checking to the caller. So in the end I think I like the current fix more.

In general, I am not a big fan of asserts in the production code. For anything unexpected or undefined, isn't it better to deal with it gracefully and report a LogError? In this particular case it looks like we are encountering a situation where a pixel module has zero digis produced and an empty vector (DetSet) is passed on to the clustering routine. So this I would say is nothing particularly alarming and probably does not even require issuing a LogError. The following commented out line https://github.com/cms-sw/cmssw/blob/CMSSW_12_1_0_pre3/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#L135 seems to suggest that this scenario can indeed occur. However, what I am a bit confused about is why these empty DetSets are not simply dropped from the digi collection? Either way, the clusterizer code should be able to handle such cases without any trouble.

is this empty case corresponding to (end == begin) ?
or is this a case of end before begin?

@slava77, end == begin

@OzAmram, I had a quick chat with @tsusa today precisely about this issue of end == begin which suggests that in the digi collection we have an empty vector of digis stored for a particular detId. The question then is why is this empty vector not dropped from the collection in the first place. It would therefore be good to check how that happens. On the other hand, even if there is some issue in the digi producer which can lead to such situations, the clusterizer should be immune against such cases which is what this PR achieves.

Yeah I tend to agree that it would be good if the clusterizer does not fail for such a case. Maybe we add LogWarning message to this PR and followup later to try and track down the upstream issue?

In my view, at least the code comments should be reasonably clear why a function is expected to sometimes get incorrect inputs. Something like "avoid undefined behaviour" may be quite mysterious for a reader later.

So if fixing this at the source is out of scope, how about:

//In rare cases, this function gets called with an empty DetSet. //This is not expected to be a problem because of XYZ if (end <= begin) return;

begin == end is a reasonable case for a generic caller and it makes sense to me that if there is some preamble computation in the method to skip it if it's clearly not needed.
However, the part with end < begin looks bad and better be resolved at the caller.

I agree with @slava77. The end < begin case looks bad but by construction it is not possible in this particular case. In the caller code the begin and end are iterators from the same vector so in the worst case end==begin. So for such cases the method could issue a LogWarning and return.

cmsbuild · 2021-09-20T15:18:37Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35275/25399

This PR adds an extra 20KB to repository

smuzaffar · 2021-09-28T06:09:12Z

please test

smuzaffar · 2021-09-28T06:12:26Z

@OzAmram , sorry I force pushed a change here in order to get a new commit. The previous commit has reached the max commit statuses limit of 1000 which was causing bot to fail with error message like

Validation Failed
This SHA and context has reached the maximum number of statuses.

cmsbuild · 2021-09-28T06:17:35Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35275/25574

This PR adds an extra 20KB to repository

cmsbuild · 2021-09-28T06:18:05Z

Pull request #35275 was updated. @jpata, @slava77 can you please check and sign again.

cmsbuild · 2021-09-28T12:15:07Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/19166/summary.html
COMMIT: 866cfe7
CMSSW: CMSSW_12_1_X_2021-09-27-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35275/19166/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-110411/134.706_RunMuonEG2015B+RunMuonEG2015B+HLTDR2_50ns+RECODR2_50nsreHLT_HIPM+HARVESTDR2

Summary:

No significant changes to the logs found
Reco comparison results: 4 differences found in the comparisons
DQMHistoTests: Total files compared: 40
DQMHistoTests: Total histograms compared: 3211080
DQMHistoTests: Total failures: 5
DQMHistoTests: Total nulls: 1
DQMHistoTests: Total successes: 3211052
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.004 KiB( 39 files compared)
DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
Checked 169 log files, 37 edm output root files, 40 DQM output files
TriggerResults: no differences found

jpata · 2021-09-28T12:23:29Z

+reconstruction

retest and resign since Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275 (comment)
technical, fixes an issue reported in [UBSAN] Undefined behavior in Reco* and TrackingTools reco packages #35036 coming from empty DetSets
reco tests passed and showed nothing relevant in Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275 (comment)
UBSAN was not rerun here

cmsbuild · 2021-09-28T12:23:50Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

smuzaffar · 2021-09-28T13:54:54Z

please test workflow 134.706 for CMSSW_12_1_UBSAN_X

lets test based on latest UBSAN IB

cmsbuild · 2021-09-29T03:48:55Z

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-110411/19198/summary.html
COMMIT: 866cfe7
CMSSW: CMSSW_12_1_UBSAN_X_2021-09-27-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35275/19198/install.sh to create a dev area with all the needed externals and cmssw changes.

Found compilation warnings

Unit Tests

I found errors in the following unit tests:

---> test EcnaCalculationsExample had ERRORS
---> test testUCTUnpacker had ERRORS

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

/data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-110411/134.706_RunMuonEG2015B+RunMuonEG2015B+HLTDR2_50ns+RECODR2_50nsreHLT_HIPM+HARVESTDR2

Summary:

You potentially added 21448 lines to the logs
Reco comparison results: 66379 differences found in the comparisons
DQMHistoTests: Total files compared: 40
DQMHistoTests: Total histograms compared: 3211080
DQMHistoTests: Total failures: 415441
DQMHistoTests: Total nulls: 14
DQMHistoTests: Total successes: 2795603
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: -0.162 KiB( 39 files compared)
DQMHistoSizes: changed ( 10224.0 ): 0.117 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 136.731,... ): 0.004 KiB JetMET/SUSYDQM
DQMHistoSizes: changed ( 250202.181 ): -0.533 KiB SiStrip/MechanicalView
DQMHistoSizes: changed ( 25202.0 ): 0.246 KiB SiStrip/MechanicalView
Checked 169 log files, 37 edm output root files, 40 DQM output files
TriggerResults: found differences in 14 / 39 workflows

perrotta · 2021-09-29T07:06:10Z

+1

Bug fix for some rare but not impossible case
Still failing unit tests in UBSAN are not related

copy_to_buffer return right away if range empty

ffdea77

cmsbuild added this to the CMSSW_12_1_X milestone Sep 14, 2021

cmsbuild added code-checks-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Sep 14, 2021

cmsbuild added code-checks-approved and removed code-checks-pending labels Sep 14, 2021

cmsbuild added bug-fix tests-started and removed tests-pending labels Sep 14, 2021

cmsbuild added tests-approved and removed tests-started labels Sep 14, 2021

jpata reviewed Sep 16, 2021

View reviewed changes

slava77 mentioned this pull request Sep 16, 2021

[UBSAN] Undefined behavior in Reco* and TrackingTools reco packages #35036

Closed

Check range before calling copy_to_buffer, add LogWarning

f8f2afc

cmsbuild added code-checks-pending tests-pending and removed tests-approved code-checks-approved labels Sep 20, 2021

cmsbuild removed the code-checks-pending label Sep 20, 2021

cmsbuild added code-checks-pending pending-signatures reconstruction-pending tests-pending labels Sep 28, 2021

cmsbuild added tests-started and removed tests-pending labels Sep 28, 2021

cmsbuild added code-checks-approved and removed code-checks-pending labels Sep 28, 2021

cmsbuild mentioned this pull request Sep 28, 2021

Tracker phase2 bricked pixel localreco cmssw 12 1 x #35441

Merged

cmsbuild added tests-approved and removed tests-started labels Sep 28, 2021

cmsbuild added fully-signed reconstruction-approved and removed reconstruction-pending pending-signatures labels Sep 28, 2021

cmsbuild added orp-approved and removed orp-pending labels Sep 29, 2021

cmsbuild merged commit f63ce42 into cms-sw:master Sep 29, 2021

This was referenced Sep 29, 2021

Added esConsumes to ConversionSeedFinder #35475

Merged

More diagnostic plots for CSC trigger primitives in validation step #35329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275

Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275

OzAmram commented Sep 14, 2021

cmsbuild commented Sep 14, 2021

cmsbuild commented Sep 14, 2021

tvami commented Sep 14, 2021

tvami commented Sep 14, 2021

cmsbuild commented Sep 14, 2021

mmusich commented Sep 15, 2021

cmsbuild commented Sep 15, 2021

jpata Sep 16, 2021 •

edited

Loading

OzAmram Sep 16, 2021

ferencek Sep 16, 2021

ferencek Sep 16, 2021

slava77 Sep 16, 2021

ferencek Sep 17, 2021

OzAmram Sep 17, 2021

jpata Sep 17, 2021 •

edited

Loading

slava77 Sep 17, 2021

ferencek Sep 19, 2021

cmsbuild commented Sep 20, 2021

smuzaffar commented Sep 28, 2021

smuzaffar commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

jpata commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

smuzaffar commented Sep 28, 2021

cmsbuild commented Sep 29, 2021

perrotta commented Sep 29, 2021

Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275

Fix Rare Undefined Behavior in PixelThresholdClusterizer #35275

Conversation

OzAmram commented Sep 14, 2021

cmsbuild commented Sep 14, 2021

cmsbuild commented Sep 14, 2021

tvami commented Sep 14, 2021

tvami commented Sep 14, 2021

cmsbuild commented Sep 14, 2021

Comparison Summary

mmusich commented Sep 15, 2021

cmsbuild commented Sep 15, 2021

Comparison Summary

jpata Sep 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpata Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsbuild commented Sep 20, 2021

smuzaffar commented Sep 28, 2021

smuzaffar commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

Comparison Summary

jpata commented Sep 28, 2021

cmsbuild commented Sep 28, 2021

smuzaffar commented Sep 28, 2021

cmsbuild commented Sep 29, 2021

Unit Tests

Comparison Summary

perrotta commented Sep 29, 2021

jpata Sep 16, 2021 •

edited

Loading

jpata Sep 17, 2021 •

edited

Loading