-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM instability in 12434.0, pixelLessStep_quickAssociatorByHits #37970
Comments
assign reconstruction |
New categories assigned: reconstruction @jpata,@slava77,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @jpata Joosep Pata. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
As I noted here, it does not seem immediately related to TF giving different results on different CPUs.
|
(speaking from distant past experience) Looking at #37918 (comment), there are differences in folder These plots are "purely DQM", i.e. tracks (well, |
is it clear that the differences started showing up after #37878 ? |
there is indeed an execution path that could lead to uninitialized values (with an assumption that a default-constructed |
this should indeed be due to https://github.com/cms-sw/cmssw/blob/master/RecoTracker/MkFit/plugins/MkFitOutputConverter.cc#L637 which leaves the input undetermined in some cases. A quick fix would be adding the initialization to 0, e.g. :
I can open the PR. I imagine it needs a backport as well. |
does a 0 input return an output dnn value that would pass or fail the cuts? |
Unfortunately, the all 0 tensor passes the selection (though not significant). |
type tracking |
I understand this would be something urgent to address before the final release of 12_4_0 (closed to physics changes, but bugfixes should be fine). Is this something you can address in the next week or so, @leonardogiannini? |
Would something like this work? diff --git a/RecoTracker/MkFit/plugins/MkFitOutputConverter.cc b/RecoTracker/MkFit/plugins/MkFitOutputConverter.cc
index 16fdfad9ec1..d27575aaad2 100644
--- a/RecoTracker/MkFit/plugins/MkFitOutputConverter.cc
+++ b/RecoTracker/MkFit/plugins/MkFitOutputConverter.cc
@@ -616,6 +616,8 @@ std::vector<float> MkFitOutputConverter::computeDNNs(TrackCandidateCollection co
tensorflow::Tensor input1(tensorflow::DT_FLOAT, {bsize_, 29});
tensorflow::Tensor input2(tensorflow::DT_FLOAT, {bsize_, 1});
+ std::vector<int> toFail;
+
for (auto nb = 0; nb < nbatches + 1; nb++) {
for (auto nt = 0; nt < bsize_; nt++) {
int itrack = nt + bsize_ * nb;
@@ -634,6 +636,7 @@ std::vector<float> MkFitOutputConverter::computeDNNs(TrackCandidateCollection co
if (!(tsAtClosestApproachTrackCand.isValid())) {
edm::LogVerbatim("TrackBuilding") << "TrajectoryStateClosestToBeamLine not valid";
+ toFail.push_back(itrack);
continue;
}
@@ -720,7 +723,12 @@ std::vector<float> MkFitOutputConverter::computeDNNs(TrackCandidateCollection co
continue;
float out0 = 2.0 * outputs[0].matrix<float>()(nt, 0) - 1.0;
- output[itrack] = out0;
+
+ if (std::find(toFail.begin(), toFail.end(), itrack) != toFail.end()) {
+ output[itrack] = -1.f;
+ } else {
+ output[itrack] = out0;
+ }
}
} |
yes, that's it more or less. I am testing almost the same. You or I can open the PR. Let me know |
go ahead! |
Looks like having been actually fixed by #38004, see also #37954 (comment) |
+reconstruction |
This issue is fully signed and ready to be closed. |
[12.4.X] address undef output of dnn - for issue #37970
In recent PRs it seems the workflow 12434.0 has DQM changes in Tracking/TrackParameters/generalTracks/SeedMon/pixelLessStep/TrackBuilding, pixelLessStep_quickAssociatorByHits
cms-sw/cmsdist#7862
#37918
#37952
The change is not always in the same direction:
@cms-sw/tracking-pog-l2
The text was updated successfully, but these errors were encountered: