Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

[Question] How to use external memory for temp storage? #35

Closed
gravitino opened this issue Jul 21, 2015 · 1 comment
Closed

[Question] How to use external memory for temp storage? #35

gravitino opened this issue Jul 21, 2015 · 1 comment

Comments

@gravitino
Copy link

When calling

BlockRadixSort(temp_storage).SortBlockedToStriped(thread_keys, thread_values);

the examples demand memory that can be seen by all threads e.g.

shared typename BlockRadixSort::TempStorage temp_storage;

However, how do I use external memory if temp_storage needs more than 48KiB? How do I allocate this memory from the host?

Thanks in advance,

Christian

@dumerrill
Copy link
Contributor

You would use cudaMalloc to allocate it in device memory. However, you need to be sure to:

  • Allocate an array of N TempStorages, where N is the number of thread blocks in your grid. Passing the array pointer as a kernel function parameter, each thread block would then index its own TempStorage and pass that to the BlockSort
  • When getting the sizeof() TempStorage, make sure to explicitly parameterize outer BlockRadixSort class with the the PTX architecture (e.g., 350 for Kepler). Different architectures require different storage. Normally this is all parameterized by CUDA_ARCH when the storage is only ever named in device code, but the compiler uses a different pass for the host code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants