Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix - change treatment of GPU tracks not on CPU #284

Conversation

felicepantaleo
Copy link

In very rare cases a GPU track, which does not exist on the CPU, might be created.
This is due to the fact that the clusterizer is not exactly the same and a single hit difference in very rare cases could lead to a single track difference.
Currently the treatment for this track, which is not found on the CPU is to throw an assert.
This PR changes this assert into a skip of the track, fix by @VinInn

@fwyzard
Copy link

fwyzard commented Mar 13, 2019

IIUC the check is only on the number of tracks, not on the individual tracks themselves ?

@fwyzard
Copy link

fwyzard commented Mar 13, 2019

What leads to the inconsistency ?
What I mean is, how do we end up comparing the number of tracks produced on the CPU and on the GPU ?

@felicepantaleo
Copy link
Author

no, the id of the track is set to be a high number, higher than the number of tracks

@felicepantaleo
Copy link
Author

felicepantaleo commented Mar 13, 2019

Validation summary

Reference release CMSSW_10_5_0_pre2 at 4fd4ff5
Development branch CMSSW_10_5_X_Patatrack at 96828db
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.52
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.53

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.52
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fpantale.web.cern.ch/fpantale/patatrack/pulls/61e0fe24b9ce170ae65b1b57c509f4fc23b89b02/log .

@makortel
Copy link

Does the inconsistency originate from here

for (uint32_t it=0; it<tuples_->nTuples; ++it) {
auto q = tuples_->quality[it];
if (q != pixelTuplesHeterogeneousProduct::loose ) continue; // FIXME

?

(there is also another continue in

std::unique_ptr<reco::Track> track(
builder.build(pt, phi, cotTheta, tip, zip, chi2, iCharge, hits,
fieldESH.product(), bs));
if (!track) continue;
// filter???

but inspecting PixelTrackBuilder::build() it does not return nullptr)

@felicepantaleo
Copy link
Author

felicepantaleo commented Mar 15, 2019

GPU tracks are copied to the CPU to fill legacy data formats. For every track we read the hits and the map associates a hit GPU to a converted GPU hit on CPU.
These hits CPU are created from a cluster GPU: the problem is that a cluster CPU can contain at most 1024 digis, while clusters on GPUs don't have this limit.
This means that when a cluster GPU is used to make a CPU cluster, some digis are lost. Matching of hits to cluster is done by matching the xmin,ymin corner (this issue will be solved by associating by id). This corner digi might be lost and if this happens there's no matching.
A GPU track however might have been created with a hit that does not have a ref to a matched cluster, and if this track is used in vertexing (which is the first consumer of pixel tracks) the assertion fails.
This PR mitigates the issue by skipping the track. The solution of the issue would consist in changing how the matching is done.

@VinInn

@makortel
Copy link

Thanks @felicepantaleo for the explanation.

These hits CPU are created from a cluster GPU: the problem is that a cluster GPU can contain at most
1024 digis, while clusters on GPUs don't have this limit.

Should the latter "GPUs" be "CPUs"?

@felicepantaleo
Copy link
Author

no the error is on the previous GPU:
clusters on GPU don't have a limit, on CPU they have.

@makortel
Copy link

no the error is on the previous GPU:
clusters on GPU don't have a limit, on CPU they have.

Ok, thanks for the fix.

@felicepantaleo felicepantaleo merged commit 0fe0b5e into cms-patatrack:CMSSW_10_5_X_Patatrack Mar 15, 2019
@fwyzard fwyzard added this to the CMSSW_10_5_X_Patatrack milestone Mar 26, 2019
fwyzard pushed a commit that referenced this pull request May 15, 2019
Update master-cmsswmaster to CMSSW_10_4_0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants