Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] External libcudf APIs should expose CUDA streams #925

Closed
jrhemstad opened this issue Feb 12, 2019 · 7 comments
Closed

[FEA] External libcudf APIs should expose CUDA streams #925

jrhemstad opened this issue Feb 12, 2019 · 7 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code Spark Functionality that helps Spark RAPIDS

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Feb 12, 2019

Is your feature request related to a problem? Please describe.
In order to maximize efficiency and utilization of the GPU, you need to be able to execute kernels and memory copies on independent streams. However, no libcudf API currently exposes streams to the end user.

Describe the solution you'd like
Any API that executes a GPU kernel or allocates/copies memory should provide the end user with the option to specify a stream.

Open questions:

  • How should the stream be exposed? E.g., accept a cudaStream_t*? Or bundled inside of an options struct?
  • Who is responsible for creating/destroying the stream?
  • In a C++ API its easy to provide a default argument for the stream, but in the C API the user will always be required to specify a stream. Is this what we want?
  • Should streams be exposed in all of the Python APIs?

I suspect we'll all agree this does need to be done, the question is rather one of how and when.

@jrhemstad jrhemstad added feature request New feature or request proposal Change current process or code Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. labels Feb 12, 2019
@felipeblazing
Copy link
Contributor

I woudl argue that we should take it one step further and say that every exposed api should provide the capacity at some point to allow stream to be defined with the most priority ones you mention which do allocation which can block execution , super gross. What other options would you want to pass it? An allocator? (besides stream)

Creating / Destroying
A pattern libraries like thrust follow is that the user creates the stream and if the user doesn't supply one it uses thee default stream. I like this pattern in general.

C vs C++ api
Who wants a c only api? The user can always send stream = 0 which is the default stream.

Exposing in python
I woudl imagine so right? Why wouldn't we?

If we are compiling with --default-stream per-thread then this is slightly less of a concern in the more immediate future.

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Feb 13, 2019

I woudl argue that we should take it one step further and say that every exposed api should provide the capacity at some point to allow stream to be defined with the most priority ones you mention which do allocation which can block execution , super gross.

Sure, what I really meant is that every API that it makes sense to have execute on a stream should accept a stream argument. Some functions obviously never need a stream, e.g., https://github.com/rapidsai/cudf/blob/branch-0.6/cpp/include/cudf/functions.h#L72

What other options would you want to pass it? An allocator? (besides stream)

Maybe a device ID? Device properties? I was thinking of the ModernGPU context_t when I suggested the options struct. https://github.com/moderngpu/moderngpu/blob/master/src/moderngpu/context.hxx#L47

@kkraus14 kkraus14 removed the Needs Triage Need team to review and classify label Feb 13, 2019
@jrhemstad
Copy link
Contributor Author

I like the approach that was taken in cuML here using a cuml::handle object that is passed into every API.

rapidsai/cuml#247

You can bundle lots of library specific resources in there such as:

  • stream
  • device ID
  • memory allocator (this would allow us to better abstract the memory allocator instead of using RMM_ALLOC directly).

@mrocklin
Copy link
Collaborator

mrocklin commented Jul 24, 2019

Drawing inspiration from various other Python libraries it seems that some include a stream= keyword in most operations, such as in these numba examples

While othres use context managers, which are nice at managing global state sensibly. Here is an example from CuPy

n = 10
zs = []
map_streams = []
stop_events = []
reduce_stream = cupy.cuda.stream.Stream()
for i in range(n):
    map_streams.append(cupy.cuda.stream.Stream())

start_time = time.time()

# Map
for stream in map_streams:
    with stream:
        x = rand.normal(size=(1, 1024 * 256))
        y = rand.normal(size=(1024 * 256, 1))
        z = cupy.matmul(x, y)
        zs.append(z)
    stop_event = stream.record()
    stop_events.append(stop_event)

Personally I like the cupy solution, just because it doesn't require touching all of the API

(I acknowledge that this response may be outside of the original scope of libcudf. My apologies if so).

@EvenOldridge
Copy link

Hey @jrhemstad checking in about the status of this. This is an issue for the PyTorch and TF dataloaders that we're building. I know there are potentially other workarounds being explored.

@jrhemstad
Copy link
Contributor Author

jrhemstad commented Sep 16, 2020

Hey @jrhemstad checking in about the status of this. This is an issue for the PyTorch and TF dataloaders that we're building. I know there are potentially other workarounds being explored.

It's still an active point of conversation, but we don't have any current plans to add streams to public APIs in the near future.

@vyasr vyasr added this to the Enable streams milestone Oct 17, 2022
rapids-bot bot pushed a commit that referenced this issue Jul 13, 2023
Contributes to #925.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - MithunR (https://github.com/mythrocks)
  - David Wendt (https://github.com/davidwendt)

URL: #13629
rapids-bot bot pushed a commit that referenced this issue Aug 29, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Divye Gala (https://github.com/divyegala)

URL: #13987
rapids-bot bot pushed a commit that referenced this issue Aug 29, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #13990
rapids-bot bot pushed a commit that referenced this issue Sep 1, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - MithunR (https://github.com/mythrocks)
  - David Wendt (https://github.com/davidwendt)

URL: #14010
rapids-bot bot pushed a commit that referenced this issue Sep 6, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14034
rapids-bot bot pushed a commit that referenced this issue Sep 22, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - David Wendt (https://github.com/davidwendt)

URL: #14146
rapids-bot bot pushed a commit that referenced this issue Sep 23, 2023
This PR adds overloads of `from_arrow` and `to_arrow` for scalars to enable interoperability on par with Arrow Arrays. The new public APIs accept streams, and for consistency streams have also been added to the corresponding column APIs, so this PR contributes to #925.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #14121
rapids-bot bot pushed a commit that referenced this issue Oct 4, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #14187
rapids-bot bot pushed a commit that referenced this issue Oct 20, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)
  - David Wendt (https://github.com/davidwendt)

URL: #14263
rapids-bot bot pushed a commit that referenced this issue Nov 6, 2023
Contributes to #925

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14342
@GregoryKimball
Copy link
Contributor

Closing in favor of #13744

@GregoryKimball GregoryKimball closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2023
rapids-bot bot pushed a commit that referenced this issue Jan 19, 2024
Contributes to #925. Introduces cuda_stream parameter for downstream users to provide for `labeling_bins`

Authors:
  - Danial Javady (https://github.com/ZelboK)
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #14401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. proposal Change current process or code Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

7 participants