-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproducibility of GPU vs CPU results at HLT #35376
Comments
assign hlt, heterogeneous |
New categories assigned: heterogeneous,hlt @fwyzard,@Martin-Grunewald,@makortel,@missirol you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @fwyzard Andrea Bocci. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
@silviodonato now there is a GitHuib issue... |
The HCAL discrepancies are partially addressed by @mariadalfonso in #35355 / #35357 |
assign trk-dpg |
assign hcal-dpg |
New categories assigned: hcal-dpg,trk-dpg @mmusich,@georgia14,@tsusa,@mseidel42 you have been requested to review this Pull request/Issue and eventually sign? Thanks |
thanks |
what does |
|
Than it would be better to run with 1K or even 10K events just to understand which are seriously systematic affected. |
The difference between HLT_PFHT350_v19 and HLT_PFHT370_v17 is only in the L1seeds, caloHT cut (320 vs 300), and PFHT cut (370 vs 350). All the filters cut on the same objects (hltHtMhtJet30 and hltPFHTJet30). |
@fwyzard is there a similar comparison between gpu and cpu (non-legacy wf) ? |
Hi @tsusa, the instructions above under Pixel local reconstruction changes should compare
while those under Pixel track reconstruction changes should compare
Are you looking for something else ? |
Hi @fwyzard,
|
You can make this comparison by creating the configuration for the Pixel local reconstruction changes, running it on the CPU (e.g. setting |
legacy pixel local reconstruction is NOT supposed to be identical to "new pixel local reconstruction", (and not required to) |
Well, this is news to me. |
e.g. treatment of the generic CPE errors? |
ah, yes - still, how much different do we expect it to be ? (and anyway this is outside the scope of the discrepancy discussed in this issue, which should use the new reconstruction for both CPU and GPU workflows) |
There are plots in the talk linked above. I let you judged by yourself. |
run twice on gpu. Got:
|
About scouting, 17 events, out of 1268, have a gpu - cpu met difference larger than 5 GeV.
|
On the HN Danek wrote that it is hard to know the reason without doing a more detailed study, and that he would try to monitor it to see if there are any systematic effects.
Unless some way to distinguish the good and bad pixel hits can be found, I would suggest one of the following:
Does anybody has a better suggestion ? |
while keeping either one of the duplicates has some (50%?) chances of doing the right thing, this is clearly wrong in all cases. Moreover if the duplicate pixels end up being one at the edge of the cluster, might screw-up the determination of the position in the CPE code. |
is It possible that the problem of pixels not belonging to the event is not limited to duplicate pixels? |
I agree with @mmusich, summing up charges or taking their average is almost certainly wrong so taking only one at least has a good chance of being correct. Regarding Vincenzo's comment above, indeed, based on what Danek wrote in the HN thread, there could be spurious hits present in the event caused by the same mechanism which we would never detect as problematic if they are not duplicate. In fact, if the underlying cause is a bit flip in the pixel address as opposed to a stale content in the data buffer, we possibly end up having two problems at the same time, a missing and a fake hit in the same event. |
would "a bit flip" make duplicate more likely? (a pixel in the same cluster with a bit-flipped may generate a duplicate?). |
In the presence of something completely unexpected ("duplicate" pixels), how can you be certain that taking one or the other is correct ? Also, the "right" one could be
If it is always the first one or always the last one, and we implement the other case in the unpacker, we are effectively always making the wrong choice -- which is not any different from using the sum or the average. |
I am not certain, but without knowing for sure (that requires more studies) summing pixels would be exchanging a possible mistake for a certain mistake. |
OK, I guess I see things differently, but in the end it doesn't matter: as long as we don't know, any algorithm that is fully reproducible should be an improvement over the present situation, and I'm fine with that. The problem is that, as @VinInn pointed out "first" and "last" are not easy to implement in a parallel algorithm. |
The most significant effect is when we do not reconstruct a pixel clusters because of duplicates.
My 2 cents: we should take the maximum of the N duplicated pixels. Our top priority is to reconstruct correctly the pixel clusters 1) in signal events 2) close to the signal objects (jets or leptons). |
On the other hand duplicates may be due to bit-flip: most probably the original pixel was closeby even in the same cluster. I think a detailed study is needed. We do not even know if duplicates pixels are contiguous in the rawdata or not. (And CMS ignored all this for 20 years) |
@VinInn how does the clustering code behave in the presence of "duplicate" pixels ? |
I posted the relevant line above: |
@VinInn is there a trivial way to sum the pixels also on the GPU side? Btw. We finally found a bug in the L2 tau reco! Now the trigger with the largest difference should is
Ganesh will update the study including the L2 tau NN workaround and using the sum of the duplicated pixels in the next days. |
On 15 Mar, 2022, at 5:34 PM, Silvio Donato ***@***.***> wrote:
@VinInn is there a trivial way to sum the pixels also on the GPU side?
Just to avoid that having two pixels might create some differences (second order effect).
Not trival on GPU (maybe adding a N^2 algo kernel that detects duplicate and overwrites one with the sum the other with 0.
(the 0 one can be removed when converting to legacy)
Easier somehow when converting to legacy. (always N^2 algo)
v.
|
In case people active in this thread want to contribute to the discussion, the issue about duplicate pixels is going to be tackled today at the Joint TRK DPG/POG meeting |
Following Danek presentation I made a small test using this code
Run 5000 events in Run 323775, Event 33310106, LumiSection 53 So they are there. about 1.5 per events. many in the same location. (ok I had to write the module id as well) |
so in event Run 323775, Event 32123778, LumiSection 53 notice how many times 159 208 appears...
|
Isn't the CPU code keeping the last occurrence https://github.com/cms-sw/cmssw/blob/CMSSW_12_2_0/RecoLocalTracker/SiPixelClusterizer/plugins/PixelThresholdClusterizer.cc#L292 (assuming the charge crosses the threshold which is always the case for the Based on Danek's investigation that duplicate pixels seems to be coming from noisy ROCs, perhaps the best strategy would be to drop such pixels. |
yes @ferencek , you are right. |
So I implemented also the N^2 duplicate finding at module level (in the clusterizer) (summary 789 Hz instead of 845)
|
To those who wish to test the above please |
@VinInn thank you very much for the detailed test ! As you suggested, removing only the nearby duplicates has a negligible impact, from And I understand that the cost would have to be paid in any case, not only in the hopefully rare runs and modules which are affected. |
I spent last few days in trying to understand why the Fishbone was not reproducible on GPU (even after doing full combinatorial) at the end was this
I hope everybody appreciates the level of sophistication required to obtain bit-wise reproducibility in presence of not ordered collections |
Many thanks Vincenzo! |
I would like to point out that a similar combinatorial algorithm running on CPU using GPU output will NOT reproduce either (unless the collection is somehow sorted). |
here the PR on fishbone |
Tests done in multiple recent releases have shown that the HLT results are not consistent when running on GPU vs on CPU.
Here are the instruction to reproduce the issue using
CMSSW_12_1_0_pre3
/dev/CMSSW_12_1_0/GRun/V1
/RelValTTbar_14TeV/CMSSW_12_0_0_pre6-PU_120X_mcRun3_2021_realistic_v4_JIRA_129-v1/GEN-SIM-DIGI-RAW
/store/relval/CMSSW_12_1_0_pre3/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_121X_mcRun3_2021_realistic_v2-v1/10000/0eb14c4a-e363-424a-9c0c-2688c7d32c74.root
auto:phase1_2021_realistic
; running on a previous release using the same global tag as the sample itself (120X_mcRun3_2021_realistic_v4
) shows a similar behaviour.setup a CMSSW working area
extract the HLT configuration for running on GPU using the Run3 era
hltGetConfiguration /dev/CMSSW_12_1_0/GRun/V1 \ --eras Run3 \ --globaltag auto:phase1_2021_realistic \ --mc \ --unprescale \ --output minimal \ --customise HLTrigger/Configuration/customizeHLTforPatatrack.customizeHLTforPatatrack \ --input /store/relval/CMSSW_12_0_0_pre6/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_120X_mcRun3_2021_realistic_v4_JIRA_129-v1/00000/79c06ed5-929b-4a57-a4f2-1ae90e6b38c5.root \ > hlt.py
run the HLT menu on a GPU-equipped machine
compare the results
To disentangle the various effects, one can use different customisations on top of the HLT menu, running each resulting configuration with a GPU and without a GPU (that is, fully on the CPU).
Replace the customisation at the bottom of the
hlt.py
filewith a more fine-grained one, described below.
legacy configuration
Run the HLT menu unchanged, adding only the
Status_OnGPU
andStatus_OnCPU
paths, without actually offloading any reconstruction to GPU:ECAL-only changes
To check the impact of running the ECAL reconstruction on GPU vs CPU, apply only the ECAL changes:
HCAL-only changes
To check the impact of running the HCAL reconstruction on GPU vs CPU, apply only the HCAL changes:
Pixel local reconstruction changes
To check the impact of running the Pixel local reconstruction on GPU vs CPU, apply only the Pixel changes:
Pixel track reconstruction changes
To check the impact of running the Pixel local reconstruction on GPU vs CPU, apply only the Pixel and Tracking changes.
Clearly, for this comparison to be meaningful, the previous one needs to be understood first.
The ECAL-only comparison did not reveal significant differences.
The HCAL-only comparison showed significant differences in a few % of the events (order of 10% of the accepted events).
The Pixel local reconstruction comparison showed significant differences in a few % of the events (order of 10% of the accepted events), while affecting less paths than the HCAL one.
I think that looking at the Pixel track comparison makes sense only after fixing the local reconstruction one.
Updates
for running with recent IBs, please use Fix the HLT GPU customisation for running on the CPU #35497 .
the 12.1.0-pre3 relvals can also be used, for example
/store/relval/CMSSW_12_1_0_pre3/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_121X_mcRun3_2021_realistic_v2-v1/10000/0eb14c4a-e363-424a-9c0c-2688c7d32c74.root
.The text was updated successfully, but these errors were encountered: