-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] New suballocator memory resources #162
Conversation
…eblazing/rmm into felipeblazing-feature/use-memory-manager
…eblazing/rmm into felipeblazing-feature/use-memory-manager
…ctor-block_list-class
Regarding multi-device, I think we can do something like what Thrust does with maintaining a different resource per device. See: |
} | ||
|
||
template <> | ||
MRTest<hybrid_mr>::~MRTest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be some missing dtor specializations.
Consider instead using a vector<unique_ptr<device_memory_resource>>
in the test fixture and just push upstream resources onto the vector. That way the upstreams will be automatically destroyed when the fixture is destroyed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking difficult without bumping RMM to C++14...
* @param threshold_size Allocations > this size (in bytes) use large_mr. Allocations <= this size | ||
* use small_mr. Must be a power of two. | ||
*/ | ||
explicit hybrid_memory_resource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit hybrid_memory_resource( | |
hybrid_memory_resource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
inline block merge_blocks(block const& a, block const& b) | ||
{ | ||
RMM_EXPECTS(a.ptr + a.size == b.ptr, "Merging noncontiguous blocks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function probably shouldn't throw. Use asserts instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
template< class InputIt > | ||
void insert( InputIt first, InputIt last ) { | ||
for (auto iter = first; iter != last; ++iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::for_each
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @tparam UpstreamResource memory_resource to use for allocating the pool. Implements | ||
* rmm::mr::device_memory_resource interface. | ||
*/ | ||
template <typename UpstreamResource> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consistency.
template <typename UpstreamResource> | |
template <typename Upstream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in all 3.
cudaGetDevice(&device); | ||
int memsize{0}; | ||
cudaDeviceProp props; | ||
cudaGetDeviceProperties(&props, device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudaGetDeviceProperties(&props, device); | |
RMM_CUDA_TRY(cudaGetDeviceProperties(&props, device)); |
std::set<block> allocated_blocks_; | ||
|
||
// blocks allocated from upstream heap: so they can be easily freed | ||
std::map<char*, block> upstream_blocks_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just be a vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the upstream blocks could just be device_buffer
s. That would simplify their lifetime management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector: done
device_buffer: no -- decided against that as discussed do to all the extra data it stores.
// Try to find a block in the same stream | ||
auto iter = stream_blocks_.find(stream); | ||
if (iter != stream_blocks_.end()) | ||
b = block_from_sync_list(iter->second, size, stream, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just returning instead of assigning to b
would simplify logic quite a bit in this function.
b = block_from_sync_list(iter->second, size, stream, stream); | |
return block_from_sync_list(iter->second, size, stream, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Much clearer, and faster!
* @brief Free all memory allocated from the upstream memory_resource. | ||
* | ||
*/ | ||
void free_all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be more consistent with the std::pmr
resources to call this release
, see https://en.cppreference.com/w/cpp/memory/synchronized_pool_resource/release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @return block A pointer to memory of size `get_block_size()`. | ||
*/ | ||
void* get_block(cudaStream_t stream) { | ||
void* p = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another place where just returning instead of assigning to p
would simplify logic a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This PR introduces a few new suballocator
memory_resource
s.pool_memory_resource
is a coalescing pool suballocator memory resource. The behavior is effectively the same as cnmem_memory_resource, except that it is not currently thread safe and the code uses STL containers rather than custom linked lists.fixed_size_memory_resource
allocates fixed size blocks from a pre-allocated list (which can grow as blocks get used). Provides constant time allocation and deallocation for allocations that fit within its fixed block size.fixed_multisize_memory_resource
allocates blocks from a specified range of fixed power of two block sizes. (e.g. the default is 256KB to 4MB).hybrid_memory_resource
given two suballocators and a byte size threshold to determine which one to use, returns arbitrary size allocations. Typically this means using afixed_multisize_memory_resource
below a threshold (say 4MiB), and asub_memory_resource
above that threshold.Tests for all of the above are added, and the tests have been consolidated a bit so they run faster.
This PR also makes the RANDOM_ALLOCATIONS google benchmark more flexible so it can be run for a requested allocator and a given number of allocations and maximum allocation size.
TODO:
supports_get_mem_info
which returns false for all the new resource types.Future PRs: