[Question] How to use external memory for temp storage? #35

gravitino · 2015-07-21T12:44:53Z

When calling

BlockRadixSort(temp_storage).SortBlockedToStriped(thread_keys, thread_values);

the examples demand memory that can be seen by all threads e.g.

shared typename BlockRadixSort::TempStorage temp_storage;

However, how do I use external memory if temp_storage needs more than 48KiB? How do I allocate this memory from the host?

Thanks in advance,

Christian

dumerrill · 2016-11-23T17:34:11Z

You would use cudaMalloc to allocate it in device memory. However, you need to be sure to:

Allocate an array of N TempStorages, where N is the number of thread blocks in your grid. Passing the array pointer as a kernel function parameter, each thread block would then index its own TempStorage and pass that to the BlockSort
When getting the sizeof() TempStorage, make sure to explicitly parameterize outer BlockRadixSort class with the the PTX architecture (e.g., 350 for Kepler). Different architectures require different storage. Normally this is all parameterized by CUDA_ARCH when the storage is only ever named in device code, but the compiler uses a different pass for the host code.

dumerrill closed this as completed Nov 23, 2016

Provide feedback