posts/cuda-mode-notes/lecture-008/ #57

utterances-bot · 2024-12-01T14:20:17Z

GPU MODE Lecture 8: CUDA Performance Checklist – Christian Mills

Lecture #8 provides a comprehensive guide to CUDA performance optimization techniques, covering key concepts like memory coalescing, occupancy, control divergence, tiling, privatization, thread coarsening, and algorithm rewriting with better math, illustrated with practical examples and profiling using NCU to improve kernel performance.

https://christianjmills.com/posts/cuda-mode-notes/lecture-008/

mredenti · 2024-12-01T14:20:19Z

I am actually a bit skeptical about the benefits of thread coarsening for such simple kernels as vector addition, or generally kernels where there is actually not enough redundant work to trade parallelism for increase memory access and compute efficiency. I have run the vector addition example on a A100 and although I get a factor 2x improvement with thread coarsening

VecAdd execution time: 0.006144 ms
VecAddCoarsened execution time: 0.003072 ms

the speedup vanishes as you increase the workload N.

mredenti · 2024-12-01T14:48:50Z

Also it seems to me that the way you are mapping threads to indices hinders memory coalescing

cj-mills · 2024-12-01T18:14:42Z

Hi @mredenti,
The GPU Mode Discord channel would be a better place to discuss your findings from going through the lectures. These are just my personal notes and not part of the official lecture series.

GPU Mode Discord Channel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

posts/cuda-mode-notes/lecture-008/ #57

posts/cuda-mode-notes/lecture-008/ #57

utterances-bot commented Dec 1, 2024

mredenti commented Dec 1, 2024

mredenti commented Dec 1, 2024

cj-mills commented Dec 1, 2024

posts/cuda-mode-notes/lecture-008/ #57

posts/cuda-mode-notes/lecture-008/ #57

Comments

utterances-bot commented Dec 1, 2024

GPU MODE Lecture 8: CUDA Performance Checklist – Christian Mills

mredenti commented Dec 1, 2024

mredenti commented Dec 1, 2024

cj-mills commented Dec 1, 2024