Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Resource Aware Pool #901

Merged
merged 153 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from 127 commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
e099fbc
initial changes for resource aware pool
kab163 Apr 23, 2024
04ade92
initial changes for resource aware pool
kab163 Apr 23, 2024
ced4817
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Apr 25, 2024
5e6f904
making fixes because i know how to do stuff
kab163 Apr 25, 2024
cda63bf
silencing warnings
kab163 Apr 25, 2024
e9a13f2
rap updates
kab163 Apr 30, 2024
a1b852f
changes to make it almost work
kab163 Apr 30, 2024
1f38604
changes to make resource first, trying to integrate allocate_resource
kab163 May 1, 2024
309475d
fixing errors in example and adding work-around for rm.move deallocate
kab163 May 1, 2024
2ffa9b6
updates so far
kab163 Jun 5, 2024
86d28e5
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Jun 5, 2024
1b3c4d5
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Jun 19, 2024
9c9a85d
adding changes to make things work with hip
kab163 Jun 19, 2024
da06bf6
updating RAP, still WIP
kab163 Jul 1, 2024
96462a6
maybe this works...added more event tracking in deallocate
kab163 Jul 9, 2024
2f58838
getting rid of my_test and extra comments
kab163 Jul 10, 2024
3235865
adding new do_deallocate, adding resource to coalesce, etc
kab163 Jul 11, 2024
4c0c5dd
adding a way to reuse pending blocks if appliable
kab163 Jul 11, 2024
b6a7e58
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Jul 15, 2024
11cb6c2
random fixes and changed map names
kab163 Jul 15, 2024
cbb6e76
fixing indentations, updating pending vector in release and deallocate
kab163 Jul 16, 2024
4dd75d3
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Jul 17, 2024
9cd4321
updating how pending, free, and used states are updated and used
kab163 Jul 18, 2024
381d601
adding call to do_dealloc from allocate
kab163 Jul 18, 2024
a75c3ed
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Jul 18, 2024
46f6c46
updates to RAP, trying to add helper functions
kab163 Jul 23, 2024
311f2b5
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Jul 24, 2024
3aa2ce1
fixing bug in getResource
kab163 Jul 25, 2024
5a09235
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Aug 1, 2024
cc7e0eb
updating test example
kab163 Aug 1, 2024
8f97b3e
updating checks, adding todo notes
kab163 Aug 1, 2024
39ea9ac
Apply style updates
Aug 1, 2024
d288eea
adding doc page
kab163 Aug 15, 2024
b107c82
edits to docs page
kab163 Aug 15, 2024
fcd2833
name update
kab163 Aug 15, 2024
9ee8716
adding resource check in deallocate and fixing hip errors
kab163 Aug 20, 2024
0443888
Apply style updates
Aug 20, 2024
67c432d
updating the way we handle calling deallocate with no resource
kab163 Aug 27, 2024
830388c
update to docs
kab163 Aug 27, 2024
95d24e6
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Aug 27, 2024
12114e3
Apply style updates
Aug 27, 2024
b8ce172
adding a to_string method for resources for debugging and logging info
kab163 Aug 27, 2024
be50f1c
fixing merge
kab163 Aug 27, 2024
ed1e6e0
Apply style updates
Aug 27, 2024
cb28ba4
adding test
kab163 Aug 27, 2024
8f30b00
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Aug 27, 2024
3ebb0c2
Apply style updates
Aug 27, 2024
d69413c
updating test and fixing a few bugs
kab163 Aug 28, 2024
d6053b2
fixing merge
kab163 Aug 28, 2024
0a65307
Apply style updates
Aug 28, 2024
3f5a021
removing cuda dependency not needed
kab163 Aug 28, 2024
bc4972d
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Aug 28, 2024
238da8a
adding test to make test
kab163 Aug 28, 2024
7efd602
fixing bugs in rap test
kab163 Aug 29, 2024
c128a18
Apply style updates
Aug 29, 2024
b56d2df
updates to example
kab163 Aug 29, 2024
8500d8c
updating merge
kab163 Aug 29, 2024
e50d947
hackathon updates
kab163 Aug 29, 2024
17c2e71
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Sep 10, 2024
67969a3
fixing merge
kab163 Sep 16, 2024
55721ac
making example simpler and fixing cmake
kab163 Sep 16, 2024
0955ad7
Apply style updates
Sep 16, 2024
3a6f2ee
updating docs
kab163 Sep 16, 2024
cd159b8
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Sep 16, 2024
0a74bfd
updates to getResource, RAP header, and commenting out events still
kab163 Sep 17, 2024
262e5a5
Apply style updates
Sep 17, 2024
1f28000
Merge branch 'develop' of https://github.com/LLNL/Umpire into task/ka…
kab163 Oct 3, 2024
8c0a53c
updates to cmake
kab163 Oct 3, 2024
3ddd233
trying to get CI to pass
kab163 Oct 3, 2024
a1d379b
adding docs, fixing ci errors
kab163 Oct 3, 2024
60f8f52
more fixes to pass CI
kab163 Oct 3, 2024
380d29e
Apply style updates
Oct 3, 2024
edf478a
fixing docs
kab163 Oct 3, 2024
30c90da
Merge branch 'task/kab163/resource-aware-pool' of https://github.com/…
kab163 Oct 3, 2024
6ed786b
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Oct 7, 2024
4acbadb
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Oct 7, 2024
267780a
not using sycl with RAP yet
kab163 Oct 7, 2024
a3a8e26
fixing hip errors
kab163 Oct 7, 2024
b3d2138
fixing mem leak
kab163 Oct 7, 2024
857cf35
trying to resolve hip error
kab163 Oct 7, 2024
f8230f2
trying to resolve clock issue
kab163 Oct 7, 2024
776d7bb
trying to resolve mem leak
kab163 Oct 7, 2024
69a600e
Apply style updates
Oct 7, 2024
912a5c0
Merge branch 'develop' of github.com:LLNL/Umpire into task/kab163/res…
kab163 Oct 8, 2024
bac0c83
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Oct 8, 2024
50816bd
Merge branch 'develop' of https://github.com/LLNL/Umpire into task/ka…
kab163 Oct 8, 2024
0f0b700
trying to fix mem leak in test
kab163 Oct 8, 2024
6582279
Apply style updates
Oct 8, 2024
110a7f3
temp commit to do debugging
kab163 Oct 9, 2024
96214c5
Merge branch 'develop' of https://github.com/LLNL/Umpire into task/ka…
kab163 Oct 9, 2024
e034448
Merge branch 'task/kab163/resource-aware-pool' of https://github.com/…
kab163 Oct 9, 2024
7e5c173
Apply style updates
Oct 9, 2024
b71e970
moving location of setting chunk->free
kab163 Oct 9, 2024
c8d7aff
Merge branch 'task/kab163/resource-aware-pool' of https://github.com/…
kab163 Oct 9, 2024
812f2d4
trying to resolve mem leaks
kab163 Oct 9, 2024
d01df3a
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Oct 10, 2024
d2c58d1
fixing mem leak
kab163 Oct 10, 2024
a372ac6
Apply style updates
Oct 10, 2024
6b1cfbf
merging with develop
kab163 Oct 11, 2024
d96e150
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Oct 11, 2024
244bee4
updates to RAP
kab163 Oct 11, 2024
15168d1
tester benchmark
kab163 Oct 11, 2024
f0cc90e
Apply style updates
Oct 11, 2024
37ad840
updating temp benchmark
kab163 Oct 11, 2024
09cb0cb
temp debug statements
kab163 Oct 11, 2024
7c33d96
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Oct 11, 2024
d432427
Apply style updates
Oct 11, 2024
afb809f
merging with develop
kab163 Oct 15, 2024
426753d
Merge branch 'task/kab163/resource-aware-pool' of https://github.com/…
kab163 Oct 15, 2024
c1fae83
updates to RAP class
kab163 Oct 16, 2024
0824050
merging with myself
kab163 Oct 16, 2024
d28220f
Apply style updates
Oct 16, 2024
1bb28aa
fixing bug in pending status tracking
kab163 Oct 22, 2024
5e9b122
Apply style updates
Oct 22, 2024
6ede374
fixing another bug
kab163 Oct 23, 2024
1956af8
Apply style updates
Oct 23, 2024
c79b088
fixing benchmark and final fixes to rap class
kab163 Oct 23, 2024
3f4792e
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Oct 23, 2024
dd88e46
in progress trying to add events for allocate and deallocate with res…
kab163 Oct 23, 2024
7601037
Apply style updates
Oct 23, 2024
9ee6b00
fixing events
kab163 Oct 25, 2024
9643dd8
fixing sqlite events
kab163 Oct 25, 2024
226984f
making sure name for shared memory allocator is string so example com…
kab163 Oct 25, 2024
8748e69
Merge branch 'task/kab163/resource-aware-pool' of https://github.com/…
kab163 Oct 25, 2024
feb758d
Update docs/sphinx/cookbook/resource_aware_pool.rst
kab163 Oct 25, 2024
ac71b4e
Update docs/sphinx/cookbook/resource_aware_pool.rst
kab163 Oct 25, 2024
b52e54f
Fixing another docs typo
kab163 Oct 25, 2024
a028234
Update resource_aware_pool.rst
kab163 Oct 25, 2024
ee56eea
implementing changes from review
kab163 Oct 31, 2024
9141211
Apply style updates
Oct 31, 2024
3c4e4d0
renaming example and cleaning up
kab163 Nov 5, 2024
64939d0
making getResource and getNumPending private
kab163 Nov 6, 2024
4be2f36
Apply style updates
Nov 6, 2024
b47e0c3
adding aligned_size and aligned_highwatermark_size stuff
kab163 Nov 6, 2024
d6798e4
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Nov 6, 2024
8ae1cbe
making benchmark work for both cuda and hip
kab163 Nov 6, 2024
347fc2d
fixing test and adding to docs
kab163 Nov 6, 2024
98bbbbb
changing m_pending to a list and adding Release test
kab163 Nov 7, 2024
0024944
cleaning up pending list check in allocate
kab163 Nov 7, 2024
7fa13e8
revisions from review
kab163 Nov 25, 2024
e47f80a
Apply style updates
Nov 25, 2024
88558e0
more changes from review
kab163 Nov 25, 2024
eaf2d2a
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Nov 25, 2024
6aea1fe
changing order of allocate and deallocate functions so resource is last
kab163 Dec 3, 2024
7d4d668
Apply style updates
Dec 3, 2024
8345094
adding default resource when not specified
kab163 Dec 3, 2024
e9eccf5
Apply style updates
Dec 3, 2024
afd4cfd
fixing test
kab163 Dec 3, 2024
262308c
update docs
kab163 Dec 3, 2024
04095da
adding Host as the default resource, adding to docs, release only rel…
kab163 Dec 5, 2024
cc1c824
Apply style updates
Dec 5, 2024
b7dbac4
moving do_deallocate call for finished pending chunks to coalesce
kab163 Dec 5, 2024
ca4c55f
Merge branch 'task/kab163/resource-aware-pool' of github.com:LLNL/Ump…
kab163 Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ blt_add_executable(
SOURCES allocator_stress_test.cpp
DEPENDS_ON ${stress_test_depends})

if (UMPIRE_ENABLE_HIP)
blt_add_executable(
NAME resource_aware_pool_stress_test
SOURCES resource_aware_pool_stress_test.cpp
DEPENDS_ON umpire blt::hip blt::hip_runtime)
endif()

if (UMPIRE_ENABLE_DEVICE_ALLOCATOR)
if (UMPIRE_ENABLE_CUDA)
blt_add_executable(
Expand Down
171 changes: 171 additions & 0 deletions benchmarks/resource_aware_pool_stress_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
#include <stdio.h>
#include <math.h>
#include <iostream>

#include "camp/camp.hpp"
#include "umpire/ResourceManager.hpp"
#include "umpire/Umpire.hpp"
#include "umpire/strategy/QuickPool.hpp"
#include "umpire/strategy/ResourceAwarePool.hpp"

using namespace camp::resources;

constexpr int ITER = 5;
constexpr int NUM = 2048;
const int NUM_PER_BLOCK = 256;
const int NUM_BLOCKS = NUM/NUM_PER_BLOCK;

using clock_value_t = long long;

__device__ clock_value_t my_clock()
{
return clock64();
}

__device__ void sleep(clock_value_t sleep_cycles)
{
clock_value_t start = my_clock();
clock_value_t cycles_elapsed;
do {
cycles_elapsed = my_clock() - start;
} while (cycles_elapsed < sleep_cycles);
}

__global__ void do_sleep()
{
// sleep - works still at 1000, so keeping it at 100k
kab163 marked this conversation as resolved.
Show resolved Hide resolved
sleep(100000000);
}

__global__ void touch_data(double* data)
{
int id = blockIdx.x * blockDim.x + threadIdx.x;

if (id < NUM) {
data[id] = id;
}
}

__global__ void check_data(double* data)
{
int id = blockIdx.x * blockDim.x + threadIdx.x;

//Then error check that data[id] still == id
if (id < NUM) {
if (data[id] != id)
data[id] = -1;
}
}

__global__ void touch_data_again(double* data)
{
int id = blockIdx.x * blockDim.x + threadIdx.x;

if (id < NUM) {
data[id] = 8.76543210;
}
}
kab163 marked this conversation as resolved.
Show resolved Hide resolved

void QuickPool_check(umpire::Allocator quick_pool)
{
auto& rm = umpire::ResourceManager::getInstance();
bool error{false};

// Create hip streams
hipStream_t s1, s2;
hipStreamCreate(&s1); hipStreamCreate(&s2);
kab163 marked this conversation as resolved.
Show resolved Hide resolved

double* a = static_cast<double*>(quick_pool.allocate(NUM * sizeof(double)));

hipLaunchKernelGGL(touch_data, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, s1, a);
kab163 marked this conversation as resolved.
Show resolved Hide resolved
hipLaunchKernelGGL(do_sleep, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, s1);
hipLaunchKernelGGL(check_data, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, s1, a);

quick_pool.deallocate(a);
a = static_cast<double*>(quick_pool.allocate(NUM * sizeof(double)));

hipLaunchKernelGGL(touch_data_again, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, s2, a);

double* b = static_cast<double*>(quick_pool.allocate(NUM * sizeof(double)));
rm.copy(b, a);
kab163 marked this conversation as resolved.
Show resolved Hide resolved
b = static_cast<double*>(rm.move(b, rm.getAllocator("HOST")));
kab163 marked this conversation as resolved.
Show resolved Hide resolved

for (int i = 0; i < NUM; i++) {
if(b[i] == (-1)) {
error = true;
break;
}
}

if (error) {
std::cout << "Errors Found!" << std::endl;
kab163 marked this conversation as resolved.
Show resolved Hide resolved
} else {
std::cout << "Kernel succeeded! Expected result returned" << std::endl;
}

quick_pool.deallocate(a);
rm.deallocate(b);
hipStreamDestroy(s1); hipStreamDestroy(s2);
}

void ResourceAwarePool_check(umpire::Allocator rap_pool)
{
// Create hip resources
Hip d1, d2;
Resource r1{d1}, r2{d2};

// ResourceAwarePool checks
auto& rm = umpire::ResourceManager::getInstance();
bool error{false};

for(int i = 0; i < ITER; i++) {
double* a = static_cast<double*>(rap_pool.allocate(r1, NUM * sizeof(double)));

hipLaunchKernelGGL(touch_data, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, d1.get_stream(), a);
hipLaunchKernelGGL(do_sleep, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, d1.get_stream());
hipLaunchKernelGGL(check_data, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, d1.get_stream(), a);

rap_pool.deallocate(r1, a);
a = static_cast<double*>(rap_pool.allocate(r2, NUM * sizeof(double)));

hipLaunchKernelGGL(touch_data_again, dim3(NUM_BLOCKS), dim3(NUM_PER_BLOCK), 0, d2.get_stream(), a);

double* b = static_cast<double*>(rap_pool.allocate(r2, NUM * sizeof(double)));
kab163 marked this conversation as resolved.
Show resolved Hide resolved
rm.copy(b, a);
kab163 marked this conversation as resolved.
Show resolved Hide resolved
b = static_cast<double*>(rm.move(b, rm.getAllocator("HOST")));

for (int i = 0; i < NUM; i++) {
if(b[i] == (-1)) {
error = true;
break;
}
}

if (error) {
std::cout << "Errors Found!" << std::endl;
} else {
std::cout << "Kernel succeeded! Expected result returned" << std::endl;
}

rap_pool.deallocate(r2, a);
rm.deallocate(b);
error = false; // reset to find any new errors in next iter
}
}

int main(int, char**)
{
auto& rm = umpire::ResourceManager::getInstance();
auto quick_pool = rm.makeAllocator<umpire::strategy::QuickPool>("quick-pool", rm.getAllocator("UM"));
auto rap_pool = rm.makeAllocator<umpire::strategy::ResourceAwarePool>("rap-pool", rm.getAllocator("UM"));

std::cout<<"Checking QuickPool ...."<<std::endl;
QuickPool_check(quick_pool);

std::cout<<"Checking ResourceAwarePool ...."<<std::endl;
ResourceAwarePool_check(rap_pool);

std::cout<<"Done!"<<std::endl;
return 0;
}

1 change: 1 addition & 0 deletions docs/sphinx/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ that have introspection disabled for improved performance, and applying CUDA
cookbook/get_largest_available_block_in_pool.rst
cookbook/coalesce_pool.rst
cookbook/pinned_pool.rst
cookbook/resource_aware_pool.rst
cookbook/plot_allocations.rst
cookbook/mixed_pool.rst
cookbook/thread_safe.rst
Expand Down
Binary file added docs/sphinx/cookbook/multi-mem.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
150 changes: 150 additions & 0 deletions docs/sphinx/cookbook/resource_aware_pool.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
.. _resource_aware_pool:

=============================
Creating a Resource Aware Pool
=============================

This recipe describes how to create and use an Umpire ``ResourceAwarePool``. This pool is somewhat advanced
so we also provide a bit of background on Camp resources which are used to track resources and events.

Camp Resources
--------------

Umpire uses `Camp <https://github.com/LLNL/camp>`_ resources to keep track of "streams of execution". A single "stream of execution"
on the device corresponds to a single Camp device resource (e.g. a single cuda stream).
Similarly, when we are executing on the host, this
corresponds to a separate "stream of execution" and therefore a separate Camp host resource.

Typically, we deal with multiple Camp resources. This includes a single resource for the host and
one or more for the device, depending on how many (cuda, hip, etc.) streams we have in use.
While we can have multiple camp resources for the device (e.g. multiple cuda streams),
we can only have one resource for the host because the host only has one stream of execution.
Since we are dealing with Camp resources, we call this pool strategy the ``ResourceAwarePool``.

Throughout the rest of this documentation page, we will use a "camp resource" to refer to a "stream of
execution". If the camp resource is on the device, then we are referring to a device stream such
as a cuda stream or hip stream.

Using a Single Resource
-----------------------

Umpire's strategies such as ``QuickPool`` and ``DynamicPoolList`` work very well
on the device when we are dealing with a single camp device resource. In the figure below, we have
the host resource which allocates memory (a\ :sub:`1`\), uses the memory in a kernel (k\ :sub:`1`\1), then schedules
kab163 marked this conversation as resolved.
Show resolved Hide resolved
a deallocate (d\ :sub:`2`\). Then, the host immidiately reuses that memory for a different kernel (k\ :sub:`2`\).

.. image:: ./single_mem.png

In this scenario, there is no potential for a data race, since we are dealing with just one cuda stream
and kernels on a single stream execute sequentially. In other words, this scenario deals with only
one Camp device resource. In this type of scenario, there is no need for a ``ResourceAwarePool`` because
it would behave the same as your typical ``QuickPool``.

.. note:: A ``ResourceAwarePool`` with only one resource will behave the same as a ``QuickPool`` strategy.
We don't advise using the ``ResourceAwarePool`` with only one resource since it will have the added
overhead of the pending state (explained below). Instead, just use ``QuickPool`` if possible.

Using Multiple Resources
------------------------

When dealing with multiple camp device resources, there is a possibility for a data race if we allocate, use, and
schedule a deallocation on one stream and then try to reuse that memory immediately on another stream.
The figure below depicts that scenario. Note that the overlap in the kernels corresponds to a potential
data race.

.. image:: ./multi-mem.png

Umpire's ``ResourceAwarePool`` is designed to avoid any potential data races by making the resources
"aware" of the memory used by another resource. If resource ``r2`` needs to allocate memory, but that
memory is potentially still being used by another resource, ``r1``, then ``r2`` will use different
memory instead. To do that, the ``ResourceAwarePool`` introduces a "pending" state. As soon as ``r1``
schedules a deallocation, that memory is marked as ``_``pending``_``. Only once that pending memory has
adayton1 marked this conversation as resolved.
Show resolved Hide resolved
actually been deallocated will it not be marked ``_``pending``_`` anymore. When ``r2`` needs to reallocate that
memory, it will first check to see if the memory is still ``_``pending``_``. If it is NOT ``_``pending``_``, it will
reuse that memory, otherwise it will use a different piece of memory instead.

The figure below illustrates the 3 states of a ``ResourceAwarePool``: free, used, and pending.

.. image:: ./states.png

Note that if you schedule a deallocate, but then try to reuse that memory on the SAME
resource, that memory will NOT be labeled ``_``pending``_``. It is only when we have scheduled a deallocate
on one resource and then try to reuse that same memory on a different resource that we have
the potential for a data race and thus the need for the pending state.

Using a ResourceAwarePool
-------------------------

In this example, we will review how to use the :class:`umpire::strategy::ResourceAwarePool`
strategy. You can create a ``ResourceAwarePool`` with the following code:

.. code-block:: bash

auto& rm = umpire::ResourceManager::getInstance();
auto pool = rm.makeAllocator<umpire::strategy::ResourceAwarePool>("rap-pool", rm.getAllocator("UM"));

Next, you will want to create camp resources. We use these camp resources to track events
on the resource. Below is an example of creating a camp resource for two device streams and the host.

.. code-block:: bash

using namespace camp::resources;
...
Cuda d1, d2; //create Cuda resources, d1 for stream1, d2 for stream2
Host h1; //create a Host resource
Resource r1{d1}, r2{d2}, r3{h1}; //Initialize the Camp resources

Then, to allocate memory with your ``ResourceAwarePool`` you can do the following:

.. code-block:: bash

double* a = static_cast<double*>(pool.allocate(r1, NUM_THREADS * sizeof(double)));

Note that there is an extra parameter when using the ``allocate`` function. The first parameter is
the resource (``r1``) we want the allocated memory to be associated with. In other words, ``r1`` is
the device stream we want to launch the kernel on which will use that memory. Next, be sure to launch the kernel using the
correct stream. Since we are using Camp resources, we use ``d1`` that we created above. For example:

.. code-block:: bash

my_kernel<<NUM_BLOCKS, BLOCK_SIZE, 0, d1.get_stream()>>>(a, NUM_THREADS);

.. note:: If you lose track of which resource you need to use for the kernel launch, you can call
``getResource(a)`` and that will return the resource associated with that pointer. However, be sure
to launch the kernel with the underlying (cuda/hip/etc) resource (i.e. ``d1``) not the generic resource
(i.e. ``r1``) as there is no ``get_stream()`` function associated with the generic resource.

The kernel launch specifies the stream from the Cuda resource we created above.
To deallocate, use the following code:

.. code-block:: bash

pool.deallocate(r1, a);

.. note::
It can be hard to keep track of which resource corresponds to which pointer. If it is not feasible to keep track
of that, you can call ``pool.deallocate(ptr)`` as usual. However, this method will call ``getResource(ptr)``
on the ``ResourceAwarePool`` instance and then call ``pool.deallocate(r, ptr)`` where ``r`` is the resource
returned from the ``getResource`` function call.

Assuming you need to reallocate memory on ``a`` with ``r2``, you could then launch a second kernel with the second stream. For example:

.. code-block:: bash

a = static_cast<double*>(pool.allocate(r2, NUM_THREADS * sizeof(double)));
...
my_other_kernel<<NUM_BLOCKS, BLOCK_SIZE, 0, d2.get_stream()>>>(a, NUM_THREADS);

Note the use of ``d2`` in this kernel launch since ``d2`` is the underlying (cuda) resource for the generic resource, ``r2``.

Since we are using the ``ResourceAwarePool``, we will not cause a data race from trying to reuse that memory. If the
memory is still being used by ``r1`` by the time ``r2`` is requesting it, it will be in a ``_``pending``_`` state and thus
not resued by ``r2``. Instead, ``r2`` will be given a different piece of memory.

The ``ResourceAwarePool`` will also be useful for avoiding data races in a situation where host and device
share a single memory space. In the case of a single memory space, just having two or more camp resources,
whether host or device, will give us the potential for data races since memory can be visible by both host and device.

A full example of using the ``ResourceAwarePool`` can be seen below:

.. literalinclude:: ../../../examples/rap_example.cpp
Binary file added docs/sphinx/cookbook/single_mem.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/sphinx/cookbook/states.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ if (UMPIRE_ENABLE_HIP)
NAME multi_device
SOURCES multi_device.cpp
DEPENDS_ON umpire blt::hip)

list(APPEND umpire_examples multi_device)
endif()

Expand All @@ -60,6 +61,24 @@ blt_add_executable(
DEPENDS_ON ${example_depends})
list(APPEND umpire_examples strategy_example)

if (UMPIRE_ENABLE_CUDA)
blt_add_executable(
NAME rap_example
SOURCES rap_example.cpp
DEPENDS_ON ${example_depends} cuda)
kab163 marked this conversation as resolved.
Show resolved Hide resolved
elseif (UMPIRE_ENABLE_HIP)
blt_add_executable(
NAME rap_example
SOURCES rap_example.cpp
DEPENDS_ON ${example_depends} blt::hip)
else()
blt_add_executable(
NAME rap_example
SOURCES rap_example.cpp
DEPENDS_ON ${example_depends})
endif()
list(APPEND umpire_examples rap_example)

blt_add_executable(
NAME allocator
SOURCES allocator.cxx
Expand Down
2 changes: 1 addition & 1 deletion examples/cookbook/recipe_named_allocation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ int main(int, char**)
auto allocator = rm.getAllocator("HOST");
std::vector<void*> allocations;

allocations.push_back(allocator.allocate("My Allocation Name", 100));
allocations.push_back(allocator.allocate(std::string{"My Allocation Name"}, 100));
allocations.push_back(allocator.allocate(1024));

for (auto ptr : allocations) {
Expand Down
2 changes: 1 addition & 1 deletion examples/cookbook/recipe_shared_memory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ int main(int ac, char** av)
//
// Allocate shared memory
//
void* ptr{node_allocator.allocate("allocation_name_2", sizeof(uint64_t))};
void* ptr{node_allocator.allocate(std::string{"allocation_name_2"}, sizeof(uint64_t))};
uint64_t* data{static_cast<uint64_t*>(ptr)};

if (shared_rank == foreman_rank)
Expand Down
Loading
Loading