[FEA] Deprecate and remove rmmGetInfo #305

jrhemstad · 2020-02-24T14:01:15Z

Is your feature request related to a problem? Please describe.

RMM currently provides the function rmmGetInfo to return the free and total amounts of memory.

Since the refactor to use device_memory_resources (see #301), rmmGetInfo simply calls get_mem_info on the default resource:

rmm::mr::get_default_resource()->get_mem_info( stream);

The interface defined by std::pmr::memory_resource does not provide an interface to return this information, so we were required to add a non-standard get_mem_info pure virtual function to the device_memory_resource. This forces all resource implementations to provide an implementation for get_mem_info.

This is problematic as for some resource implementations it is difficult or impossible to provide this information---requiring the implementation of get_mem_info to simply throw a "not supported" exception. For example, the Thrust pool is based off of the std::pmr::memory_resource interface and does not provide a mechanism to implement get_mem_info and thus can only throw.

Describe the solution you'd like

Eliminate the rmmGetInfo API in addition to the get_mem_info pure virtual function in device_memory_resource.

Note that this does not mean get_mem_info needs to be eliminated all together. Some device_memory_resources can implement get_mem_info, e.g., cuda_memory_resource or cnmem_memory_resource. We can easily keep the get_mem_info as members of those derived classes.

What I am suggesting is removing it from the device_memory_resource base class and thus eliminating the need for all derived types to implement it. There is no reason we need to remove it from the derived device_memory_resource types that can support get_mem_info.

However, this does mean you can't call get_mem_info through the device_memory_resource base class pointer. You will first need to cast to the appropriate derived type (which requires knowing what kind of resource is pointed to through the device_memory_resource*) to access it's specific member functions.

Describe alternatives you've considered

Alternatively, we can keep get_mem_info and as we continue to add resources, most of them will just throw an exception. This doesn't seem very helpful.

The text was updated successfully, but these errors were encountered:

jlowe · 2020-02-27T17:45:50Z

From the Spark perspective the Java bindings do not expose this, so nothing will immediately break there. We were thinking of using this during asynchronous memory spill to get an idea of how much needs to spill and when to stop spilling. However we can approximate that by wrapping the memory resource and tracking the allocation total ourselves. This approach won't account for any padding added by the wrapped allocator, but it should be close enough for a ballpark figure to use for async spill thresholds.

jrhemstad · 2020-02-27T18:27:39Z

However we can approximate that by wrapping the memory resource and tracking the allocation total ourselves.

We can probably provide a opt-in resource adaptor that simply keeps a running sum of all allocate calls and subtracts for any deallocate calls.

It would be documented that this doesn't guarantee accurate information about the actual availability of memory in the upstream resource due to fragmentation, padding, etc.

This could also be useful for memory leak detection.

gmarkall · 2020-02-27T22:33:28Z

This is OK from the perspective of adding support for external memory management plugins to Numba - it is not essential for Numba to know how much total and free memory is available, and the spec of NBEP 7 can be modified to make the get_memory_info method implementation optional.

jakirkham · 2020-02-28T00:03:58Z

cc @pentschev (as RMM memory info is used in Dask-CUDA tests here and here)

teju85 · 2020-02-28T04:06:20Z

I totally get your reasoning behind removing rmmGetInfo().

However, I want to subjectively point out that the downside of removing it can make some implementations which want to rely on such info to dynamically configure their behavior (more workspace = lesser iterations and vice-versa, etc) impossible. So, those will have to go with a sub-optimal static configurations, which can certainly have less than optimal performance. That said, it is currently hard to give an estimate on perf hit due to this, as it is very use-case dependent.

In such cases, we can even get away with an isAllocationPossible() like API as well and then doing some sort of binary search to converge on the right size.

jrhemstad · 2020-02-28T04:23:29Z

I totally get your reasoning behind removing rmmGetInfo().

However, I want to subjectively point out that the downside of removing it can make some implementations which want to rely on such info to dynamically configure their behavior (more workspace = lesser iterations and vice-versa, etc) impossible. So, those will have to go with a sub-optimal static configurations, which can certainly have less than optimal performance. That said, it is currently hard to give an estimate on perf hit due to this, as it is very use-case dependent.

To be clear, certain device_memory_resources can still easily provide the same functionality as rmmGetInfo. However, it is not reasonable to expect all possible implementations to support it. So as a user, you are free to choose to use a device_memory_resource that supports get_mem_info and use it accordingly.

What's being discussed here is dropping the requirement/expectation that all resources support this functionality.

Furthermore, to achieve what you described, you can easily use a rudimentary allocation tracker adaptor like I described here: #305 (comment)

In such cases, we can even get away with an isAllocationPossible() like API as well and then doing some sort of binary search to converge on the right size.

If an allocation size isn't feasible, then a rmm::bad_alloc exception will be thrown. You could still easily do the binary search by catching the exception and trying a smaller allocation size until you find one that succeeds.

pentschev · 2020-02-28T10:17:28Z

We only use it in dask-cuda for a test, confirming that we setup an RMM pool and checking that its size matches. Would that still be possible once this is removed? It doesn't matter the code to do that would be more complex in dask-cuda as it's only a test anyway.

jrhemstad · 2020-02-28T13:33:59Z

We only use it in dask-cuda for a test, confirming that we setup an RMM pool and checking that its size matches. Would that still be possible once this is removed?

For the CNMEM pool, yes. But there are other resource implementations where it wouldn't be possible.

pentschev · 2020-02-28T16:59:41Z

For the CNMEM pool, yes. But there are other resource implementations where it wouldn't be possible.

Alright, that seems good enough, as it's only a check anyway. Thanks for clarifying, from dask-cuda side we are good with the deprecation/removal.

jrhemstad · 2020-06-16T02:46:18Z

Closed by #396

jrhemstad added feature request New feature or request proposal Change current process or code labels Feb 24, 2020

harrism mentioned this issue Feb 25, 2020

[REVIEW] New suballocator memory resources #162

Closed

3 tasks

gmarkall mentioned this issue Feb 28, 2020

[REVIEW] Provide External Memory Management Plugin for Numba #317

Merged

harrism mentioned this issue Mar 6, 2020

[REVIEW] New suballocator memory resource classes #314

Merged

3 tasks

jrhemstad mentioned this issue Mar 16, 2020

[REVIEW] Deprecate legacy APIs #331

Merged

This was referenced Jun 10, 2020

[REVIEW] Remove deprecated APIs #396

Merged

[BUG] Regex code calls deprecated "rmmGetInfo" function rapidsai/cudf#5436

Closed

jrhemstad closed this as completed Jun 16, 2020

harrism mentioned this issue Nov 23, 2023

Deprecate and remove get_mem_info methods from memory resources. #1388

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Deprecate and remove rmmGetInfo #305

[FEA] Deprecate and remove rmmGetInfo #305

jrhemstad commented Feb 24, 2020

jlowe commented Feb 27, 2020

jrhemstad commented Feb 27, 2020 •

edited

Loading

gmarkall commented Feb 27, 2020

jakirkham commented Feb 28, 2020

teju85 commented Feb 28, 2020

jrhemstad commented Feb 28, 2020 •

edited

Loading

pentschev commented Feb 28, 2020

jrhemstad commented Feb 28, 2020

pentschev commented Feb 28, 2020

jrhemstad commented Jun 16, 2020

[FEA] Deprecate and remove rmmGetInfo #305

[FEA] Deprecate and remove rmmGetInfo #305

Comments

jrhemstad commented Feb 24, 2020

jlowe commented Feb 27, 2020

jrhemstad commented Feb 27, 2020 • edited Loading

gmarkall commented Feb 27, 2020

jakirkham commented Feb 28, 2020

teju85 commented Feb 28, 2020

jrhemstad commented Feb 28, 2020 • edited Loading

pentschev commented Feb 28, 2020

jrhemstad commented Feb 28, 2020

pentschev commented Feb 28, 2020

jrhemstad commented Jun 16, 2020

jrhemstad commented Feb 27, 2020 •

edited

Loading

jrhemstad commented Feb 28, 2020 •

edited

Loading