Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] New suballocator memory resource classes #314

Merged
merged 133 commits into from
Mar 13, 2020

Conversation

harrism
Copy link
Member

@harrism harrism commented Feb 28, 2020

This PR introduces a few new suballocator memory_resources. Depends on #319 for thread_safe_resource_adaptor.

  • pool_memory_resource is a coalescing pool suballocator memory resource. The behavior is effectively the same as cnmem_memory_resource, except that it is not currently thread safe and the code uses STL containers rather than custom linked lists.
  • fixed_size_memory_resource allocates fixed size blocks from a pre-allocated list (which can grow as blocks get used). Provides constant time allocation and deallocation for allocations that fit within its fixed block size.
  • fixed_multisize_memory_resource allocates blocks from a specified range of fixed power of two block sizes. (e.g. the default is 256KB to 4MB).
  • hybrid_memory_resource given two suballocators and a byte size threshold to determine which one to use, returns arbitrary size allocations. Typically this means using a fixed_multisize_memory_resource below a threshold (say 4MiB), and a sub_memory_resource above that threshold.

image

Tests for all of the above are added, and the tests have been consolidated a bit so they run faster.

This PR also makes the RANDOM_ALLOCATIONS google benchmark more flexible so it can be run for a requested allocator and a given number of allocations and maximum allocation size.

TODO:

  • More documentation
  • meminfo (or remove it [FEA] Deprecate and remove rmmGetInfo #305 ). Update: worked around this by adding a new base class function supports_get_mem_info which returns false for all the new resource types.
  • Improve pool growth heuristic: smarter pool growth strategy reduces fragmentation, improves performance in conditions that previously suffered fragmentation.

image

This uses a geometric growth strategy: when the pool is exhausted, it grows pool by half the difference between maximum pool size and current pool size. If half the remainder is not sufficient for the requested size, it allocates all the remainder. If all of the remainder is not enough for the requested size, it does not grow the pool and the requested allocation fails.

This can reduce fragmentation compared to the original strategy (which just grows by the requested allocation), since growth chunks cannot be merged. e.g. if the pool is 1/2 available GPU memory, and then many small allocations come in which cause the pool to grow by small steps, the pool will be very fragmented. Growing geometrically reduces fragmentation by growing by much more than is requested.

Future PRs:

  • Use events to support per-thread default stream
  • Multi-device? May not be needed since the device can be set before creating a resource. However should probably check current device when allocating from CUDA to ensure its the same as the device the MR was created on...

@harrism harrism changed the title [REVIEW] Improve pool_memory_resource growth strategy [REVIEW] New suballocator memory resource classes. Mar 9, 2020
@harrism harrism changed the title [REVIEW] New suballocator memory resource classes. [REVIEW] New suballocator memory resource classes Mar 9, 2020
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking comment about const-ness, otherwise good to go.

@@ -201,8 +209,9 @@ class fixed_multisize_memory_resource : public device_memory_resource {

Upstream* upstream_mr_; // The upstream memory_resource from which to allocate blocks.

std::size_t min_size_; // size of smallest blocks allocated
std::size_t max_size_; // size of largest blocks allocated
std::size_t size_base_; // base of the allocation block sizes (power of 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::size_t size_base_; // base of the allocation block sizes (power of 2)
std::size_t const size_base_; // base of the allocation block sizes (power of 2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are being used for on the fly exponentiation to compute sizes, there's a better chance the compiler will optimize that out if these are const.

std::size_t min_size_; // size of smallest blocks allocated
std::size_t max_size_; // size of largest blocks allocated
std::size_t size_base_; // base of the allocation block sizes (power of 2)
std::size_t min_size_exponent_; // exponent of the size of smallest blocks allocated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::size_t min_size_exponent_; // exponent of the size of smallest blocks allocated
std::size_t const min_size_exponent_; // exponent of the size of smallest blocks allocated

std::size_t max_size_; // size of largest blocks allocated
std::size_t size_base_; // base of the allocation block sizes (power of 2)
std::size_t min_size_exponent_; // exponent of the size of smallest blocks allocated
std::size_t max_size_exponent_; // exponent of the size of largest blocks allocated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::size_t max_size_exponent_; // exponent of the size of largest blocks allocated
std::size_t const max_size_exponent_; // exponent of the size of largest blocks allocated

@harrism harrism requested a review from a team as a code owner March 11, 2020 04:20
@harrism harrism merged commit 0eb6b10 into rapidsai:branch-0.13 Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for review Ready for review by team 5 - Merge After Dependencies Depends on another PR: do not merge out of order
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants