Skip to content

Pool Allocator

Matt Norman edited this page Aug 15, 2022 · 18 revisions
   ,(   ,(   ,(   ,(   ,(   ,(   ,(   ,(
`-'  `-'  `-'  `-'  `-'  `-'  `-'  `-'  `
   _________________________
 / "Don't be a malloc-hater  \
|   Use the pool alligator!"  |
 \     _____________________ / 
  |  /
  |/       .-._   _ _ _ _ _ _ _ _
.-''-.__.-'00  '-' ' ' ' ' ' ' ' '-.
'.___ '    .   .--_'-' '-' '-' _'-' '._
 V: V 'vv-'   '_   '.       .'  _..' '.'.
   '=.____.=_.--'   :_.__.__:_   '.   : :
           (((____.-'        '-.  /   : :
                             (((-'\ .' /
                           _____..'  .'
                          '-._____.-'
   ,(   ,(   ,(   ,(   ,(   ,(   ,(   ,(
`-'  `-'  `-'  `-'  `-'  `-'  `-'  `-'  `

YAKL has a pool allocator, "Gator", that is automatically turned on and used as long as the hardware backend device has a separate memory space. The reason for the pool is that allocation and free calls on accelerator devices are typically very expensive, and scientific codes often perform allocations and free's very frequently. To facilitate doing the efficiently, a large pool of memory is allocated at YAKL's initialization, and YAKL hands out chunks of the pool during runtime very cheaply.

The thing about a pool allocator is that once your run out of memory in a given pool, you cannot resize the pool. That would invalidate the pointers you've handed out from the initial pool. Rather, you can only add new pools. Therefore, if the arrays you're allocating are "large", and size of individual pools is "small", you may find yourself in situations where no additional pool is large enough to host the size needed for that array. In those cases, YAKL will inform you that your initial pool size is too small.

You control the behavior of Gator's pool management through the following environment variables:

  • GATOR_INITIAL_MB: The initial pool size in MB
  • GATOR_GROW_MB: The size of each new pool in MB once the initial pool is out of memory

YAKL's pool allocator is pretty informative and will try to let you know what to do if an issue occurs. Some features of Gator:

  • Fortran bindings for integer, integer(8), real, real(8), and logical
  • Fortran bindings for arrays of one to seven dimensions
  • Able to call cudaMallocManaged under the hood with prefetching and memset
  • Able to support arbitrary lower bounds in the Fortran interface for Fortran pointers
  • Simple pool allocator implementation that and automatically grows as needed
  • The pool allocator responds to environment variables to control the initial allocation size, and the size of each additional pool as it grows
  • Minimal internal fragmentation for any pattern of allocations and frees
  • Warns the user if allocations are left allocated after the pool is destroyed
  • Thread safe, so feel free to use the pool inside CPU-threaded regions. Gator uses std::mutex to lock and unlock, so it is thread safe for pthreads, std::thread, and OpenMP CPU threads.

The pool search and allocation algorithm is not the fastest, but it is as close to optimal in terms of memory usage and fragmentation as you can get. The cost is typically fine because the cost of allocating data is overlapped with GPU kernel execution in most contexts. Regardless, the cost is still significantly less than most accelerator device calls to malloc and free.

Common Runtime Error Messages:

Your array is too large to fit in the existing pools or any added pools:

  • There isn't enough room in the existing pool for your variable, and the variable allocation size is larger than GATOR_GROW_MB.
  • Increasing GATOR_INITIAL_MB is your best bet. But, of course, do not set GATOR_INITIAL_MB to more memory than the GPU has available. Often a GPU that advertises, say 16GB, has a bit less than that available to you, so reduce the requested size a bit compared to the advertised memory limit.

You've run out of memory

  • You've requested an allocation that cannot fit in existing pools, but adding a new pool failed because there isn't enough memory for it.
  • It's possible you're using your memory inefficiently because individual allocations are large compared to the size of the pool.
  • Again, your best bet here is to increase GATOR_INITIAL_MB given the caveats about available GPU memory mentioned above.
  • If increasing GATOR_INITIAL_MB does not work, then you should consider increasing your node count to decrease the per-GPU memory requirements.
  • Another options is to see if you can process one of your dimensions in smaller "chunks" to see if you can reduce the memory required at any given time.
Clone this wiki locally