Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posts/cuda-mode-notes/lecture-008/ #57

Open
utterances-bot opened this issue Dec 1, 2024 · 3 comments
Open

posts/cuda-mode-notes/lecture-008/ #57

utterances-bot opened this issue Dec 1, 2024 · 3 comments

Comments

@utterances-bot
Copy link

GPU MODE Lecture 8: CUDA Performance Checklist – Christian Mills

Lecture #8 provides a comprehensive guide to CUDA performance optimization techniques, covering key concepts like memory coalescing, occupancy, control divergence, tiling, privatization, thread coarsening, and algorithm rewriting with better math, illustrated with practical examples and profiling using NCU to improve kernel performance.

https://christianjmills.com/posts/cuda-mode-notes/lecture-008/

Copy link

mredenti commented Dec 1, 2024

I am actually a bit skeptical about the benefits of thread coarsening for such simple kernels as vector addition, or generally kernels where there is actually not enough redundant work to trade parallelism for increase memory access and compute efficiency. I have run the vector addition example on a A100 and although I get a factor 2x improvement with thread coarsening

VecAdd execution time: 0.006144 ms
VecAddCoarsened execution time: 0.003072 ms

the speedup vanishes as you increase the workload N.

Copy link

mredenti commented Dec 1, 2024

Also it seems to me that the way you are mapping threads to indices hinders memory coalescing

@cj-mills
Copy link
Owner

cj-mills commented Dec 1, 2024

Hi @mredenti,
The GPU Mode Discord channel would be a better place to discuss your findings from going through the lectures. These are just my personal notes and not part of the official lecture series.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants