-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepBoostedJetTagInfoProducer failure in PromptReco_Run381443_ParkingSingleMuon4 (CMSSW_14_0_7 on AMD arch) #45190
Comments
cms-bot internal usage |
A new Issue was created by @gpetruc. @rappoccio, @smuzaffar, @makortel, @Dr15Jones, @antoniovilela, @sextonkennedy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign RecoBTag/FeatureTools |
New categories assigned: reconstruction @jfernan2,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
type pf |
type btv |
I would like to reproduce this. Anyone have a pointer for finding an AMD machine I can use interactively? Either one running EL8 or one which I can run singularity. |
e.g on
I tested the reproducer above #45190 (comment) fails with:
|
The code is crashing at this line:
This appears to be fixed by conditioning that line with: No clue why this only shows up on AMD though. |
in adding a cout before the call I get this on INTEL (lxplus806)
each time there is Tk there is aPV as well (and viceversa) |
on AMD (lxplus800)
so WHO is this 16th (actually 17th) candidate? |
I printed the size of the vector and indeed on INTEL is 16 and on AMD is 17... |
The input jet seems different |
In the event there are 123 jets (sic). Jet 2 has 16 constituents on Intel and 17 on AMD. all others have the same number. |
Anyhow this is the protection I suggest to add
Of course there is plenty of possible optimization a bit everywhere |
The input file is no more there
|
is there a way to recover the input file? I would really like to better understand the origin of the difference btw AMD and INTEL. |
@germanfgv @LinaresToine please comment. |
Should now be available at
|
On AMD, the
|
IN principle we have removed all "raw" Ofast flags that could produce a difference. |
As I recall the evidence was that there are fewer differences between AMD and Intel; there was no evidence that the results become identical. |
Is |
It is negative. Patch in [*] and output below.
[*] diff --git a/CommonTools/RecoAlgos/src/PrimaryVertexAssignment.cc b/CommonTools/RecoAlgos/src/PrimaryVertexAssignment.cc
index fad6b30333b..05042d01cca 100644
--- a/CommonTools/RecoAlgos/src/PrimaryVertexAssignment.cc
+++ b/CommonTools/RecoAlgos/src/PrimaryVertexAssignment.cc
@@ -5,6 +5,7 @@
#include "DataFormats/Math/interface/deltaR.h"
#include "TrackingTools/IPTools/interface/IPTools.h"
#include "FWCore/Utilities/interface/isFinite.h"
+#include "FWCore/MessageLogger/interface/MessageLogger.h"
std::pair<int, PrimaryVertexAssignment::Quality> PrimaryVertexAssignment::chargedHadronVertex(
const reco::VertexCollection& vertices,
@@ -184,6 +185,10 @@ std::pair<int, PrimaryVertexAssignment::Quality> PrimaryVertexAssignment::charge
// all other tracks could be non-B secondaries and we just attach them with closest Z
if (vtxIdMinSignif >= 0)
return {vtxIdMinSignif, PrimaryVertexAssignment::OtherDz};
+
+edm::LogPrint("AAAA") << "XXX pt=" << track->pt() << " eta=" << track->eta() << " phi=" << track->phi() << " dzError=" << track->dzError() << " vtxIdMinSignif=" << vtxIdMinSignif
+<< " covariance(4, 4)=" << track->covariance(4, 4);
+
//If for some reason even the dz failed (when?) we consider the track not assigned
return {-1, PrimaryVertexAssignment::Unassigned};
} |
Why only on AMD ? |
type tracking |
It seems we also now have a different failure that only occurs on AMD: |
Hello,
There's another PromptReco failure that like #45189 seems to be reproducible on AMD but not on Intel.
CMS-talk thread: https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381443-parkingsinglemuon4-deepboostedjettaginfoproducer/42164
Exception:
Recipe to reproduce it, on AMD EL8 machine
The text was updated successfully, but these errors were encountered: