Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Trajectory exception in HLT step in many workflows #35488

Closed
makortel opened this issue Sep 30, 2021 · 16 comments
Closed

Empty Trajectory exception in HLT step in many workflows #35488

makortel opened this issue Sep 30, 2021 · 16 comments

Comments

@makortel
Copy link
Contributor

(At least) 5 workflows fail in the HLT step with exception like below

----- Begin Fatal Exception 30-Sep-2021 12:39:12 CEST-----------------------
An exception of category 'TrackingTools/PatternTools' occurred while
   [0] Processing  Event run: 1 lumi: 60 event: 2951 stream: 1
   [1] Running path 'HLT_DoubleTrkMu_16_6_NoFiltersNoVtx_v1'
   [2] Calling method for module CkfTrackCandidateMaker/'hltIterL3OITrackCandidatesNoVtx'
Exception Message:
Trajectory::check() - information requested from empty Trajectory 
----- End Fatal Exception -------------------------------------------------
  • 11630.0: path HLT_DoubleTrkMu_16_6_NoFiltersNoVtx_v1 module HLT_DoubleTrkMu_16_6_NoFiltersNoVtx_v1 (log)
  • 11650.911 path HLT_DoubleTrkMu_16_6_NoFiltersNoVtx_v1 module hltIterL3OITrackCandidatesNoVtx (log)
  • 11723.17 path HLT_DoubleTrkMu_16_6_NoFiltersNoVtx_v1 module hltIterL3OITrackCandidatesNoVtx (log)
  • 13034.0 path HLT_TrkMu6NoFiltersNoVtx_v1 module hltIterL3OITrackCandidatesNoVtx (log)
  • 13034.99 path HLT_OldMu100_v3 module hltL3TrackCandidateFromL2OIState (log)
@makortel
Copy link
Contributor Author

assign hlt, reconstruction

@cmsbuild
Copy link
Contributor

New categories assigned: hlt,reconstruction

@slava77,@Martin-Grunewald,@jpata,@missirol you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

#35309 looks like a likely cause

@missirol
Copy link
Contributor

FYI @swagata87 (in case she has a guess on what the right fix is)

@swagata87
Copy link
Contributor

I am running runTheMatrix.py -l 11630.0 to see if I can reproduce the issue,

I think I should have put a check like if (!t.empty()) in
TrackingTools/TrajectoryFiltering/interface/MinHitsTrajectoryFilter.h before accessing t.lastMeasurement()

this is what I suspect.. checking..

@swagata87
Copy link
Contributor

I could reproduce the issue in 11630.0,
applied a fix in TrackingTools/TrajectoryFiltering/interface/MinHitsTrajectoryFilter.h,
and reran 11630.0 w/o any problem this time

I'll check with other workflows reported here also (11650.911, 11723.17, 13034.0, 13034.99), and if all of them run fine then I will submit a bugfix PR with the fix.

The WFs are taking a long time to run, but I think I will manage to submit the PR by tonight..

Sorry about the problems caused by my earlier PR

@slava77
Copy link
Contributor

slava77 commented Sep 30, 2021

@vmariani @mmusich
this may be of interest to you

IIUC, @swagata87 is available to check the fit. Thank you.

@missirol
Copy link
Contributor

The WFs are taking a long time to run, but I think I will manage to submit the PR by tonight..

@swagata87, in the interest of time, since the fix seems well understood, you could open the PR already, have it reviewed, and have the tests run automatically (incl. the additional wfs), which will be done anyway later.

(But unless any of the experts agrees, please proceed with your current plan.)

@makortel
Copy link
Contributor Author

(But unless any of the experts agrees, please proceed with your current plan.)

I could agree with that.

@VinInn
Copy link
Contributor

VinInn commented Oct 3, 2021

How this could ever be happened?
HLT is essentially broken. This kind of issue should be caught before merging!

@jpata
Copy link
Contributor

jpata commented Oct 4, 2021

Is there a test we can add in the short matrix to catch this early? As I understand, some of the short matrix wfs run HLT, so it's not clear to me why the tests didn't fail.

@jpata
Copy link
Contributor

jpata commented Oct 4, 2021

Noting here that the culprit PR was signed by HLT. I'm happy to hear the suggestions of experts how we can improve test coverage to catch this early.

@Martin-Grunewald
Copy link
Contributor

PR tests did not catch this bug, plain and simple. Some bugs are only uncovered by IB tests as they involve many more workflows/statistics.

@makortel
Copy link
Contributor Author

makortel commented Oct 4, 2021

The list of failing workflows over IBs also suggests that there is some randomness there, possibly caused by running the DIGI step multithreaded (the DIGI step random number sequence for a given SIM event can change according to the EDM stream the SIM event gets processed by in the DIGI step, and that "stream assignment" can be affected e.g. my machine load).

@makortel
Copy link
Contributor Author

makortel commented Oct 4, 2021

Anyway, the fix has been merged, so closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants