-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Q & A] Intercepting cudaMallocAsync API may also be suitable to this approach? #4
Comments
@wangao1236 Thanks for the feedback and also for the question! In general, for I read the CUDA docs on the "Stream Ordered Memory Allocator", which comprises a family of various functions. My understanding is that yes, we can interpose (I'll use the Driver API function name here) This is because the core return entity is a A drawback is that by doing this we nullify the "benefits" of asynchronicity as My best guess is that interposing and converting I don't have enough free time to implement and test this right now. Do you want to give it a try? We can then discuss your findings and you can finally open a PR with your contribution if all things go well! |
Thank you very much for your response! Our team have also researched and thought about the relevant content. As you mentioned, in this situation, the only solution is to forcefully convert the asynchronous operation to synchronous in order to return the correct *CUdeviceptr. In this case, the asynchronous feature is sacrificed. However, from the perspective of memory oversubscription, it is still valuable! |
Thanks for the invitation, we are also willing to contribute to this project to make it compatible with more GPU virtualization scenarios. |
Hello, I have read your thesis and code and I think your idea is great! However, I have a question. Since the introduction of Stream-Ordered Memory Allocator in CUDA 11.2, cudaMallocAsync and cudaFreeAsync APIs have been provided. If an application calls cudaMallocAsync and it is also intercepted and replaced with cudaMallocManaged, what impact does it have on the calculation results?
The text was updated successfully, but these errors were encountered: