You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
You would use cudaMalloc to allocate it in device memory. However, you need to be sure to:
Allocate an array of N TempStorages, where N is the number of thread blocks in your grid. Passing the array pointer as a kernel function parameter, each thread block would then index its own TempStorage and pass that to the BlockSort
When getting the sizeof() TempStorage, make sure to explicitly parameterize outer BlockRadixSort class with the the PTX architecture (e.g., 350 for Kepler). Different architectures require different storage. Normally this is all parameterized by CUDA_ARCH when the storage is only ever named in device code, but the compiler uses a different pass for the host code.
When calling
BlockRadixSort(temp_storage).SortBlockedToStriped(thread_keys, thread_values);
the examples demand memory that can be seen by all threads e.g.
shared typename BlockRadixSort::TempStorage temp_storage;
However, how do I use external memory if temp_storage needs more than 48KiB? How do I allocate this memory from the host?
Thanks in advance,
Christian
The text was updated successfully, but these errors were encountered: