Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run all stdgpu operations on a specified cuda stream #423

Closed
tanzby opened this issue Jun 21, 2024 · 4 comments
Closed

Run all stdgpu operations on a specified cuda stream #423

tanzby opened this issue Jun 21, 2024 · 4 comments

Comments

@tanzby
Copy link
Contributor

tanzby commented Jun 21, 2024

I notices that some functions like stdgpu::detail::memcpy is non-async and running on DEFAULT cuda stream. More details: stdgpu::detail::memcpy depends on dispatch_memcpy and it looks like:

dispatch_memcpy(void* destination,
                const void* source,
                index64_t bytes,
                dynamic_memory_type destination_type,
                dynamic_memory_type source_type) {
   
   ...
  
   // use default stream here.
   STDGPU_CUDA_SAFE_CALL(cudaMemcpy(destination, source, static_cast<std::size_t>(bytes), kind));
}

For example. if we use cuda graph and try to catch all operations on stream, error raises because diff streams (default and customers') are mixed.

stdgpu : CUDA ERROR :
  Error     : operation would make the legacy stream depend on a capturing blocking stream
  File      : external/stdgpu/src/stdgpu/cuda/impl/memory.cpp:123
  Function  : void stdgpu::cuda::dispatch_memcpy(void *, const void *, stdgpu::index64_t, stdgpu::dynamic_memory_type, stdgpu::dynamic_memory_type)

So my request: Run all stdgpu operations on a specified cuda stream

@stotko
Copy link
Owner

stotko commented Jun 21, 2024

Most of the functionality should support custom CUDA streams by taking a respective execution_policy which wraps the stream, see #351. Part of the memory API is one notable exception though, but the mempy-like function are not actually used in the containers. Could you provide some pointers to a particular function in stdgpu that triggers this error when called? Does it already happen when you only create a new container, e.g. auto c = stdgpu::vector<int>::createDeviceObject(1000);?

@tanzby
Copy link
Contributor Author

tanzby commented Jun 23, 2024

@stotko Such as stdgpu::unordered_map<>::device_range. It's non-async

Function  : void stdgpu::cuda::dispatch_memcpy(void *, const void *, stdgpu::index64_t, stdgpu::dynamic_memory_type, stdgpu::dynamic_memory_type)
    @          0x52ff5ac stdgpu::cuda::safe_call()
    @          0x52ff476 stdgpu::cuda::dispatch_memcpy()
    @          0x52fd629 stdgpu::detail::dispatch_memcpy()
    @          0x52fd7c2 stdgpu::detail::memcpy()
    @          0x52e7ebf copyHost2DeviceArray<>()
    @          0x52e7e83 stdgpu::atomic_ref<>::store()
    @          0x52e56d9 stdgpu::atomic<>::store()
    @          0x52edf64 stdgpu::detail::unordered_base<>::device_range<>()
    @          0x52edf25 stdgpu::detail::unordered_base<>::device_range()
    @          0x52ed9be stdgpu::unordered_set<>::device_range()

The whole pipeline likes:

inert_kernel<<<>>>(xxx);                                // on stream
auto block_range = block_indices().device_range();      // a sync and blocked operation
update_block_meta_kernel<<<>>>(xxx);                    // on stream

But I have to admit that this is difficult to write in the form of operating on stream. Or I don't know if it can be achieved.

@stotko
Copy link
Owner

stotko commented Jun 25, 2024

Thanks. Even though the device_range() method comes with an overload that accepts an execution_policy, it internally needs an atomic whose load() and store() functions use non-async mempy. We probably need respective overloads for these as well to get full stream support here.

@stotko
Copy link
Owner

stotko commented Nov 20, 2024

Sorry for the long delay. It took a larger refactoring to fill the gaps in the stream support, but with #450 this issue should be resolved.

@stotko stotko closed this as completed Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants