Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the pixel clusteriser to the nearest-neighbours #241

Merged
merged 1 commit into from
Jan 9, 2019

Conversation

fwyzard
Copy link

@fwyzard fwyzard commented Jan 9, 2019

First part of @VinInn 's #238.

The clusteriser is now limited to the nearest neighbours; this is faster for large occupancy and/or many isolated pixels.

@fwyzard fwyzard force-pushed the VinInn_GPUFastTracksNNClus_part1 branch from 1d07d61 to cd8474e Compare January 9, 2019 17:32
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
@fwyzard fwyzard force-pushed the VinInn_GPUFastTracksNNClus_part1 branch from cd8474e to e5a4bd8 Compare January 9, 2019 17:33
@fwyzard
Copy link
Author

fwyzard commented Jan 9, 2019

Validation summary

Reference release CMSSW_10_4_0_pre4 at d74dd18
Development branch CMSSW_10_4_X_Patatrack at db3e6f8
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_4_0_pre3-PU25ns_103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_4_0_pre3-103X_upgrade2018_realistic_v8-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/216dd603ed3ee0ec849f9d639c6e313e5dbbb4e6/log .

@fwyzard
Copy link
Author

fwyzard commented Jan 9, 2019

As a double check, below are the performance numbers on a P100 and V100: the throughput increases by +5.8% on the P100, and +5.2% on the V100, in line with #238 (comment) .

P100

System

2 CPUs:
0: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (14 cores, 14 threads)
1: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (14 cores, 14 threads)

2 NVIDIA GPUs:
0: Tesla P100-PCIE-16GB (UUID: GPU-0fcb5ec4-7bbb-edae-83d2-710bdee8875c)
1: Tesla P100-PCIE-16GB (UUID: GPU-98c248bd-f028-e0b9-1570-78b35a2a3f43)

Warming up

Reference

Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
1127.1 ± 0.8 ev/s (4000 events)
1124.5 ± 0.8 ev/s (4000 events)
1124.6 ± 0.8 ev/s (4000 events)
1118.8 ± 1.2 ev/s (4000 events)

#241

Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
1189.5 ± 0.8 ev/s (4000 events)
1189.6 ± 0.8 ev/s (4000 events)
1192.2 ± 1.0 ev/s (4000 events)
1183.5 ± 1.4 ev/s (4000 events)

V100

System

2 CPUs:
0: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (20 cores, 20 threads)
1: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (20 cores, 20 threads)

2 NVIDIA GPUs:
0: Tesla V100-PCIE-16GB (UUID: GPU-d1e5ec03-06ae-ba04-9de2-4c744732e8b1)
1: Tesla V100-PCIE-16GB (UUID: GPU-c419539d-9157-92b9-474c-9b9953a262c9)

Warming up

Reference

Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
1786.3 ± 1.4 ev/s (4000 events)
1794.3 ± 1.6 ev/s (4000 events)
1772.4 ± 1.2 ev/s (4000 events)
1771.1 ± 1.6 ev/s (4000 events)

#241

Running 4 times over 4200 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs
1867.8 ± 1.1 ev/s (4000 events)
1873.8 ± 1.6 ev/s (4000 events)
1879.2 ± 1.6 ev/s (4000 events)
1870.9 ± 1.7 ev/s (4000 events)

@fwyzard fwyzard added this to the CMSSW_10_4_X_Patatrack milestone Jan 9, 2019
@fwyzard
Copy link
Author

fwyzard commented Jan 9, 2019

No impact on physics performance, as expected.

TTbar  reference-10824.5 development-10824.5 development-10824.8 testing-10824.8
Efficiency 0.4818 0.4824 0.5727 0.5727
Number of TrackingParticles (after cuts) 5556 5556 5556 5556
Number of matched TrackingParticles 2677 2680 3182 3182
Fake rate 0.0519 0.0517 0.0344 0.0344
Duplicate rate 0.0168 0.0175 0.0003 0.0003
Number of tracks 32452 32480 43907 43907
Number of true tracks 30769 30801 42395 42395
Number of fake tracks 1683 1679 1512 1512
Number of pileup tracks 27093 27118 37689 37689
Number of duplicate tracks 546 567 12 12

@fwyzard fwyzard merged commit 854b139 into CMSSW_10_4_X_Patatrack Jan 9, 2019
@fwyzard fwyzard deleted the VinInn_GPUFastTracksNNClus_part1 branch January 9, 2019 19:50
fwyzard added a commit that referenced this pull request Oct 8, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Oct 19, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Oct 20, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Oct 23, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Nov 6, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Nov 16, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Dec 29, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
fwyzard added a commit that referenced this pull request Dec 29, 2020
The clusteriser is now limited to the nearest neighbours; this is
faster for large occupancy and/or many isolated pixels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants