Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add per-thread-default stream support to pool_memory_resource using thread-local CUDA events #425

Merged
merged 33 commits into from
Jul 10, 2020
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e2be0ba
Use per-thread events rather than per-block events
harrism Jun 22, 2020
5514703
Disable event timing; cleanup
harrism Jun 23, 2020
93bebd7
Add thread-local unique_event inner class, streamline PTDS and non-PT…
harrism Jun 24, 2020
2a18521
TODO comment
harrism Jun 24, 2020
78467d5
Simplify event wrapper class and remove ids
harrism Jun 24, 2020
4110fa0
Changelog for #425
harrism Jun 24, 2020
29d65f9
get_event looks for 0 or cudaStreamPerThread
harrism Jun 25, 2020
e53a59e
Synchronize on event destruction. Clean up docs.
harrism Jun 25, 2020
3d178ff
remove_event->destroy_event
harrism Jun 25, 2020
889bf55
Be consistent with the default location for blocks
harrism Jun 25, 2020
0d314b9
Refactor mr_tests.cpp to enable creating multithreaded tests that sha…
harrism Jun 26, 2020
b1620d9
Changes missed in previous commit
harrism Jun 26, 2020
07af3c8
Add multithreaded tests (some still failing)
harrism Jun 26, 2020
714e924
Fix failing multithreaded tests by adding specializations for test co…
harrism Jun 29, 2020
c14d1af
Make pool_memory_resource thread safe and fix PTDS destroy_event bug
harrism Jun 30, 2020
2ffbcaa
Add DEVICE_MR_PTDS_TEST
harrism Jun 30, 2020
ccea946
include mutex always
harrism Jun 30, 2020
af3b9c7
Merge branch 'branch-0.15' into fea-ptds-events
harrism Jun 30, 2020
1d039ff
Fix use-after-free race with cuda_events.
harrism Jul 1, 2020
22c77cb
Fix memory leak in test
harrism Jul 1, 2020
a6390a4
Add tests that alloc / free on different threads.
harrism Jul 1, 2020
6fd4544
better documentation of cuda_event
harrism Jul 1, 2020
1d24547
Document that this is now thread-safe and PTDS-compatible
harrism Jul 1, 2020
0d7cb97
Improve changelog
harrism Jul 1, 2020
fc133ae
Merge branch 'branch-0.15' into fea-ptds-events
harrism Jul 1, 2020
4af5652
Fix gcc7 compilation failure
harrism Jul 2, 2020
6e9bac0
Address review suggestions.
harrism Jul 2, 2020
26c2909
Only add event to ptds_events_ once!
harrism Jul 2, 2020
01385bc
Merge branch 'multi-thread-replay' into fea-ptds-events
harrism Jul 2, 2020
3df09f9
Fix streams passed to mutithreaded test
harrism Jul 3, 2020
13486e7
Update copyright
harrism Jul 10, 2020
0a57993
Combine streams and events in a struct.
harrism Jul 10, 2020
94cae03
Merge branch 'branch-0.15' into fea-ptds-events
harrism Jul 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

- PR #375 Support out-of-band buffers in Python pickling
- PR #391 Add `get_default_resource_type`
- PR #396 Remove deprecated RMM APIs.
- PR #396 Remove deprecated RMM APIs
- PR #425 Add CUDA per-thread default stream support and thread safety to `pool_memory_resource`

## Improvements

Expand All @@ -27,7 +28,7 @@
- PR #383 Explicitly require NumPy
- PR #398 Fix missing head flag in merge_blocks (pool_memory_resource) and improve block class
- PR #403 Mark Cython `memory_resource_wrappers` `extern` as `nogil`
- PR #406 Sets Google Benchmark to a fixed version, v1.5.1.
- PR #406 Sets Google Benchmark to a fixed version, v1.5.1


# RMM 0.14.0 (Date TBD)
Expand Down
15 changes: 6 additions & 9 deletions benchmarks/random_allocations/random_allocations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ allocation remove_at(allocation_vector& allocs, std::size_t index)
// nested MR type names can get long...
using cuda_mr = rmm::mr::cuda_memory_resource;
using pool_mr = rmm::mr::pool_memory_resource<cuda_mr>;
using safe_pool_mr = rmm::mr::thread_safe_resource_adaptor<pool_mr>;
using fixed_multisize_mr = rmm::mr::fixed_multisize_memory_resource<pool_mr>;
using hybrid_mr = rmm::mr::hybrid_memory_resource<fixed_multisize_mr, pool_mr>;
using safe_hybrid_mr = rmm::mr::thread_safe_resource_adaptor<hybrid_mr>;
Expand Down Expand Up @@ -184,9 +183,9 @@ resource_wrapper<cuda_mr>::resource_wrapper()
}

template <>
resource_wrapper<safe_pool_mr>::resource_wrapper()
resource_wrapper<pool_mr>::resource_wrapper()
{
mr = new rmm::mr::thread_safe_resource_adaptor<pool_mr>(new pool_mr(new cuda_mr()));
mr = new pool_mr(new cuda_mr());
}

template <>
Expand Down Expand Up @@ -228,12 +227,10 @@ resource_wrapper<fixed_multisize_mr>::~resource_wrapper()
}

template <>
resource_wrapper<safe_pool_mr>::~resource_wrapper()
resource_wrapper<pool_mr>::~resource_wrapper()
{
auto pool = mr->get_upstream();
auto cuda = pool->get_upstream();
auto cuda = mr->get_upstream();
delete mr;
delete pool;
delete cuda;
}

Expand Down Expand Up @@ -299,7 +296,7 @@ void declare_benchmark(std::string name)
if (name == "hybrid")
BENCHMARK_TEMPLATE(BM_RandomAllocations, safe_hybrid_mr)->Apply(benchmark_range);
else if (name == "pool")
BENCHMARK_TEMPLATE(BM_RandomAllocations, safe_pool_mr)->Apply(benchmark_range);
BENCHMARK_TEMPLATE(BM_RandomAllocations, pool_mr)->Apply(benchmark_range);
else if (name == "fixed_multisize")
BENCHMARK_TEMPLATE(BM_RandomAllocations, fixed_multisize_mr)->Apply(benchmark_range);
else if (name == "cnmem")
Expand All @@ -318,7 +315,7 @@ int main(int argc, char** argv)
if (argc > 3) max_size = atoi(argv[3]);
declare_benchmark(mr_name);
} else {
BENCHMARK_TEMPLATE(BM_RandomAllocations, safe_pool_mr)->Apply(benchmark_range);
BENCHMARK_TEMPLATE(BM_RandomAllocations, pool_mr)->Apply(benchmark_range);
BENCHMARK_TEMPLATE(BM_RandomAllocations, safe_hybrid_mr)->Apply(benchmark_range);
BENCHMARK_TEMPLATE(BM_RandomAllocations, cnmem_mr)->Apply(benchmark_range);
BENCHMARK_TEMPLATE(BM_RandomAllocations, cuda_mr)->Apply(benchmark_range);
Expand Down
Loading