-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Deprecate and remove rmmGetInfo #305
Comments
From the Spark perspective the Java bindings do not expose this, so nothing will immediately break there. We were thinking of using this during asynchronous memory spill to get an idea of how much needs to spill and when to stop spilling. However we can approximate that by wrapping the memory resource and tracking the allocation total ourselves. This approach won't account for any padding added by the wrapped allocator, but it should be close enough for a ballpark figure to use for async spill thresholds. |
We can probably provide a opt-in resource adaptor that simply keeps a running sum of all It would be documented that this doesn't guarantee accurate information about the actual availability of memory in the upstream resource due to fragmentation, padding, etc. This could also be useful for memory leak detection. |
This is OK from the perspective of adding support for external memory management plugins to Numba - it is not essential for Numba to know how much total and free memory is available, and the spec of NBEP 7 can be modified to make the |
cc @pentschev (as RMM memory info is used in Dask-CUDA tests here and here) |
I totally get your reasoning behind removing However, I want to subjectively point out that the downside of removing it can make some implementations which want to rely on such info to dynamically configure their behavior (more workspace = lesser iterations and vice-versa, etc) impossible. So, those will have to go with a sub-optimal static configurations, which can certainly have less than optimal performance. That said, it is currently hard to give an estimate on perf hit due to this, as it is very use-case dependent. In such cases, we can even get away with an |
To be clear, certain What's being discussed here is dropping the requirement/expectation that all resources support this functionality. Furthermore, to achieve what you described, you can easily use a rudimentary allocation tracker adaptor like I described here: #305 (comment)
If an allocation size isn't feasible, then a |
We only use it in dask-cuda for a test, confirming that we setup an RMM pool and checking that its size matches. Would that still be possible once this is removed? It doesn't matter the code to do that would be more complex in dask-cuda as it's only a test anyway. |
For the CNMEM pool, yes. But there are other resource implementations where it wouldn't be possible. |
Alright, that seems good enough, as it's only a check anyway. Thanks for clarifying, from dask-cuda side we are good with the deprecation/removal. |
Closed by #396 |
Is your feature request related to a problem? Please describe.
RMM currently provides the function
rmmGetInfo
to return the free and total amounts of memory.Since the refactor to use
device_memory_resource
s (see #301),rmmGetInfo
simply callsget_mem_info
on the default resource:rmm::mr::get_default_resource()->get_mem_info( stream);
The interface defined by
std::pmr::memory_resource
does not provide an interface to return this information, so we were required to add a non-standardget_mem_info
pure virtual function to thedevice_memory_resource
. This forces all resource implementations to provide an implementation forget_mem_info
.This is problematic as for some resource implementations it is difficult or impossible to provide this information---requiring the implementation of
get_mem_info
to simply throw a "not supported" exception. For example, the Thrust pool is based off of thestd::pmr::memory_resource
interface and does not provide a mechanism to implementget_mem_info
and thus can only throw.Describe the solution you'd like
Eliminate the
rmmGetInfo
API in addition to theget_mem_info
pure virtual function indevice_memory_resource
.Note that this does not mean
get_mem_info
needs to be eliminated all together. Somedevice_memory_resource
s can implementget_mem_info
, e.g.,cuda_memory_resource
orcnmem_memory_resource
. We can easily keep theget_mem_info
as members of those derived classes.What I am suggesting is removing it from the
device_memory_resource
base class and thus eliminating the need for all derived types to implement it. There is no reason we need to remove it from the deriveddevice_memory_resource
types that can supportget_mem_info
.However, this does mean you can't call
get_mem_info
through thedevice_memory_resource
base class pointer. You will first need to cast to the appropriate derived type (which requires knowing what kind of resource is pointed to through thedevice_memory_resource*
) to access it's specific member functions.Describe alternatives you've considered
Alternatively, we can keep
get_mem_info
and as we continue to add resources, most of them will just throw an exception. This doesn't seem very helpful.The text was updated successfully, but these errors were encountered: