Welcome to CUDA Examples! This project intends to serve as a practical landing page for CUDA C++ development. If you see anything present or missing that might make it more useful, open an issue.
The repository is split into a few main structures by example type:
- SetupAndInitExamples - stuff that happens at the beginning of a CUDA program (this is not installation).
- MemoryAndStructureExmaples - examples related to ways to allocate memory, launch kernels, or structure code to be beneificial for use with CUDA. These can and will leverage code / kernel samples, but their theme will be more around "good ways to do things" or "considerations" when writing CUDA programs.
- KernelAndLibExamples - kernels, core libraries, thrust, etc. Just general examples of how to actually load and process data on the GPU.
- ProfilingExamples - examples to profile or benchmark CUDA code.
- PerformanceChecklist - examples which cover the standard CUDA performance checklist.
I love contributions because I get to learn from you and the project grows to help more people. With that in mind, contributions should meet the following characteristics:
- Novel - no existing examples like them
- Documented - there should be a clear explanation of what you are conveying with the add
- Correct - Self-explanatory
I imagine that CUDA kernel samples, thrust samples, and other core library examples will fill up the most quickly under KernelAndLibExamples
, which means that one will eventually be the hardest to contribute to. When forming a contribution, PLEASE ensure that you are showing something novel. I do not want to reject any PRs because they are redundant - you spent time on that! But alas, I will if they don't show new concepts.
As sort of a sub-section of contributions, I'd like to cover conventions to use:
- Each example should have its own subdirectory under the appropriate section
- If your example produces output, it should run as a bash script and the output files should be added to
.gitignore
- All executables should be called
main
(this is also for.gitignore
) - All examples should work with C++20.
- Each example should include
utils/utils.cuh
, and each CUDA call that returns acudaError_t
should be wrapped with acudaCheckError
likecudaCheckError(::cudaDeviceSynchronize());
. - Each CUDA API call should be prefixed with
::
to inform the reader that it is an external API call. See example above.
The CUDA_ARCHITECTURES defaults to 86. To run with the default:
mkdir build && cd build
cmake ..
make -j$(nproc)
If you require a single different architecture (and run with 8 processors):
mkdir build && cd build
cmake -DCUDA_ARCHITECTURES="80" ..
make -j8
If you'd like to pass multiple architectures:
mkdir build && cd build
cmake -DCUDA_ARCHITECTURES="80;86;90" ..
make -j8
To build a particular example:
cd <ExampleDir>/<example>
make -j$(nproc)
The make files are less robust and may need modified. Feel free to submit a PR to fix them!