Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] New suballocator memory resources #162

Closed
wants to merge 100 commits into from

Conversation

harrism
Copy link
Member

@harrism harrism commented Oct 17, 2019

This PR introduces a few new suballocator memory_resources.

  • pool_memory_resource is a coalescing pool suballocator memory resource. The behavior is effectively the same as cnmem_memory_resource, except that it is not currently thread safe and the code uses STL containers rather than custom linked lists.
  • fixed_size_memory_resource allocates fixed size blocks from a pre-allocated list (which can grow as blocks get used). Provides constant time allocation and deallocation for allocations that fit within its fixed block size.
  • fixed_multisize_memory_resource allocates blocks from a specified range of fixed power of two block sizes. (e.g. the default is 256KB to 4MB).
  • hybrid_memory_resource given two suballocators and a byte size threshold to determine which one to use, returns arbitrary size allocations. Typically this means using a fixed_multisize_memory_resource below a threshold (say 4MiB), and a sub_memory_resource above that threshold.

image

Tests for all of the above are added, and the tests have been consolidated a bit so they run faster.

This PR also makes the RANDOM_ALLOCATIONS google benchmark more flexible so it can be run for a requested allocator and a given number of allocations and maximum allocation size.

TODO:

Future PRs:

  • Smarter growth heuristics
  • Use events to support per-thread default stream
  • Thread safety (do it as a wrapper memory resource type).
  • Multi-device? May not be needed since the device can be set before creating a resource. However should probably check current device when allocating from CUDA to ensure its the same as the device the MR was created on...

@harrism harrism marked this pull request as ready for review February 27, 2020 05:48
@harrism harrism added 3 - Ready for review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 27, 2020
@harrism harrism requested a review from jrhemstad February 27, 2020 05:48
@harrism harrism changed the title [DRAFT] New suballocator memory resource [REVIEW] New suballocator memory resource Feb 27, 2020
@jrhemstad
Copy link
Contributor

Multi-device? May not be needed since the device can be set before creating a resource. However should probably check current device when allocating from CUDA to ensure its the same as the device the MR was created on...

Regarding multi-device, I think we can do something like what Thrust does with maintaining a different resource per device. See:

https://github.com/thrust/thrust/blob/63d847beab9931978d9c894afd7432a6f94abd46/thrust/system/cuda/detail/per_device_resource.h#L53

@harrism harrism changed the title [REVIEW] New suballocator memory resource [REVIEW] New suballocator memory resources Mar 4, 2020
}

template <>
MRTest<hybrid_mr>::~MRTest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some missing dtor specializations.

Consider instead using a vector<unique_ptr<device_memory_resource>> in the test fixture and just push upstream resources onto the vector. That way the upstreams will be automatically destroyed when the fixture is destroyed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking difficult without bumping RMM to C++14...

* @param threshold_size Allocations > this size (in bytes) use large_mr. Allocations <= this size
* use small_mr. Must be a power of two.
*/
explicit hybrid_memory_resource(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
explicit hybrid_memory_resource(
hybrid_memory_resource(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
inline block merge_blocks(block const& a, block const& b)
{
RMM_EXPECTS(a.ptr + a.size == b.ptr, "Merging noncontiguous blocks");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function probably shouldn't throw. Use asserts instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
template< class InputIt >
void insert( InputIt first, InputIt last ) {
for (auto iter = first; iter != last; ++iter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::for_each.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @tparam UpstreamResource memory_resource to use for allocating the pool. Implements
* rmm::mr::device_memory_resource interface.
*/
template <typename UpstreamResource>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistency.

Suggested change
template <typename UpstreamResource>
template <typename Upstream>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in all 3.

cudaGetDevice(&device);
int memsize{0};
cudaDeviceProp props;
cudaGetDeviceProperties(&props, device);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cudaGetDeviceProperties(&props, device);
RMM_CUDA_TRY(cudaGetDeviceProperties(&props, device));

std::set<block> allocated_blocks_;

// blocks allocated from upstream heap: so they can be easily freed
std::map<char*, block> upstream_blocks_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be a vector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the upstream blocks could just be device_buffers. That would simplify their lifetime management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector: done
device_buffer: no -- decided against that as discussed do to all the extra data it stores.

// Try to find a block in the same stream
auto iter = stream_blocks_.find(stream);
if (iter != stream_blocks_.end())
b = block_from_sync_list(iter->second, size, stream, stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just returning instead of assigning to b would simplify logic quite a bit in this function.

Suggested change
b = block_from_sync_list(iter->second, size, stream, stream);
return block_from_sync_list(iter->second, size, stream, stream);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Much clearer, and faster!

* @brief Free all memory allocated from the upstream memory_resource.
*
*/
void free_all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be more consistent with the std::pmr resources to call this release, see https://en.cppreference.com/w/cpp/memory/synchronized_pool_resource/release

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @return block A pointer to memory of size `get_block_size()`.
*/
void* get_block(cudaStream_t stream) {
void* p = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another place where just returning instead of assigning to p would simplify logic a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@harrism
Copy link
Member Author

harrism commented Mar 6, 2020

I pushed changes to satisfy the code review in this PR to PR #314 which includes all code from this PR and some new changes. So I'm closing this PR. Please review #314 instead.

@harrism harrism closed this Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants