[FEA] External libcudf APIs should expose CUDA streams #925

jrhemstad · 2019-02-12T19:23:11Z

Is your feature request related to a problem? Please describe.
In order to maximize efficiency and utilization of the GPU, you need to be able to execute kernels and memory copies on independent streams. However, no libcudf API currently exposes streams to the end user.

Describe the solution you'd like
Any API that executes a GPU kernel or allocates/copies memory should provide the end user with the option to specify a stream.

Open questions:

How should the stream be exposed? E.g., accept a cudaStream_t*? Or bundled inside of an options struct?
Who is responsible for creating/destroying the stream?
In a C++ API its easy to provide a default argument for the stream, but in the C API the user will always be required to specify a stream. Is this what we want?
Should streams be exposed in all of the Python APIs?

I suspect we'll all agree this does need to be done, the question is rather one of how and when.

The text was updated successfully, but these errors were encountered:

felipeblazing · 2019-02-13T00:59:04Z

I woudl argue that we should take it one step further and say that every exposed api should provide the capacity at some point to allow stream to be defined with the most priority ones you mention which do allocation which can block execution , super gross. What other options would you want to pass it? An allocator? (besides stream)

Creating / Destroying
A pattern libraries like thrust follow is that the user creates the stream and if the user doesn't supply one it uses thee default stream. I like this pattern in general.

C vs C++ api
Who wants a c only api? The user can always send stream = 0 which is the default stream.

Exposing in python
I woudl imagine so right? Why wouldn't we?

If we are compiling with --default-stream per-thread then this is slightly less of a concern in the more immediate future.

jrhemstad · 2019-02-13T01:30:44Z

I woudl argue that we should take it one step further and say that every exposed api should provide the capacity at some point to allow stream to be defined with the most priority ones you mention which do allocation which can block execution , super gross.

Sure, what I really meant is that every API that it makes sense to have execute on a stream should accept a stream argument. Some functions obviously never need a stream, e.g., https://github.com/rapidsai/cudf/blob/branch-0.6/cpp/include/cudf/functions.h#L72

What other options would you want to pass it? An allocator? (besides stream)

Maybe a device ID? Device properties? I was thinking of the ModernGPU context_t when I suggested the options struct. https://github.com/moderngpu/moderngpu/blob/master/src/moderngpu/context.hxx#L47

jrhemstad · 2019-03-07T16:29:04Z

I like the approach that was taken in cuML here using a cuml::handle object that is passed into every API.

rapidsai/cuml#247

You can bundle lots of library specific resources in there such as:

stream
device ID
memory allocator (this would allow us to better abstract the memory allocator instead of using RMM_ALLOC directly).

mrocklin · 2019-07-24T18:06:06Z

Drawing inspiration from various other Python libraries it seems that some include a stream= keyword in most operations, such as in these numba examples

While othres use context managers, which are nice at managing global state sensibly. Here is an example from CuPy

n = 10
zs = []
map_streams = []
stop_events = []
reduce_stream = cupy.cuda.stream.Stream()
for i in range(n):
    map_streams.append(cupy.cuda.stream.Stream())

start_time = time.time()

# Map
for stream in map_streams:
    with stream:
        x = rand.normal(size=(1, 1024 * 256))
        y = rand.normal(size=(1024 * 256, 1))
        z = cupy.matmul(x, y)
        zs.append(z)
    stop_event = stream.record()
    stop_events.append(stop_event)

Personally I like the cupy solution, just because it doesn't require touching all of the API

(I acknowledge that this response may be outside of the original scope of libcudf. My apologies if so).

EvenOldridge · 2020-09-16T17:26:01Z

Hey @jrhemstad checking in about the status of this. This is an issue for the PyTorch and TF dataloaders that we're building. I know there are potentially other workarounds being explored.

jrhemstad · 2020-09-16T18:18:13Z

Hey @jrhemstad checking in about the status of this. This is an issue for the PyTorch and TF dataloaders that we're building. I know there are potentially other workarounds being explored.

It's still an active point of conversation, but we don't have any current plans to add streams to public APIs in the near future.

Contributes to #925. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) URL: #13629

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mark Harris (https://github.com/harrism) - Divye Gala (https://github.com/divyegala) URL: #13987

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #13990

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) URL: #14010

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #14034

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #14146

This PR adds overloads of `from_arrow` and `to_arrow` for scalars to enable interoperability on par with Arrow Arrays. The new public APIs accept streams, and for consistency streams have also been added to the corresponding column APIs, so this PR contributes to #925. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #14121

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #14187

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) - David Wendt (https://github.com/davidwendt) URL: #14263

Contributes to #925 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) URL: #14342

GregoryKimball · 2023-12-13T17:16:35Z

Closing in favor of #13744

Contributes to #925. Introduces cuda_stream parameter for downstream users to provide for `labeling_bins` Authors: - Danial Javady (https://github.com/ZelboK) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #14401

jrhemstad added feature request New feature or request proposal Change current process or code Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. labels Feb 12, 2019

kkraus14 removed the Needs Triage Need team to review and classify label Feb 13, 2019

jrhemstad mentioned this issue Apr 25, 2019

[FEA] Apply functions to columns independently in parallel. #1501

Closed

jrhemstad added the Spark Functionality that helps Spark RAPIDS label Oct 4, 2019

jrhemstad mentioned this issue Mar 14, 2020

[DOC] Discuss and document expected stream synchronization behavior of libcudf functions #4511

Closed

magnatelee mentioned this issue Jun 9, 2020

[REVIEW] Exposing stream arguments for joins #5428

Closed

jrhemstad mentioned this issue Nov 4, 2021

[FEA] libcudf stream parameters should default to cudaStreamPerThread when compiled for PTDS #9614

Closed

jrhemstad mentioned this issue May 16, 2022

[FEA] Replace defaulted stream value in libcudf APIs with cudf::cuda_default_stream #10864

Closed

bdice mentioned this issue Aug 22, 2022

[Experimental] Use nosync policy for Thrust calls. #11577

Closed

3 tasks

davidwendt mentioned this issue Oct 6, 2022

[FEA] Expose allocate_like and other copying utilities that take a rmm::stream_view #11653

Closed

vyasr added this to the Enable streams milestone Oct 17, 2022

vyasr mentioned this issue Jun 28, 2023

Expose streams in all public copying APIs #13629

Merged

3 tasks

vyasr mentioned this issue Jul 24, 2023

[FEA] Expose public stream-ordered C++ APIs #13744

Open

vyasr mentioned this issue Aug 28, 2023

Expose streams in public concatenate APIs #13987

Merged

3 tasks

vyasr mentioned this issue Aug 29, 2023

Expose streams in public filling APIs #13990

Merged

3 tasks

vyasr mentioned this issue Aug 30, 2023

Expose streams in public replace APIs #14010

Merged

3 tasks

vyasr mentioned this issue Sep 5, 2023

Expose streams in public search APIs #14034

Merged

3 tasks

This was referenced Sep 18, 2023

Enable direct ingestion and production of Arrow scalars #14121

Merged

Expose streams in all public sorting APIs #14146

Merged

vyasr mentioned this issue Sep 25, 2023

Expose streams in binaryop APIs #14187

Merged

3 tasks

vyasr mentioned this issue Oct 9, 2023

Expose streams in public null mask APIs #14263

Merged

3 tasks

vyasr mentioned this issue Oct 28, 2023

Expose streams in public unary APIs #14342

Merged

3 tasks

ZelboK mentioned this issue Nov 14, 2023

Expose streams in public filling APIs for label_bins #14401

Merged

3 tasks

GregoryKimball closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] External libcudf APIs should expose CUDA streams #925

[FEA] External libcudf APIs should expose CUDA streams #925

jrhemstad commented Feb 12, 2019 •

edited

Loading

felipeblazing commented Feb 13, 2019

jrhemstad commented Feb 13, 2019 •

edited

Loading

jrhemstad commented Mar 7, 2019

mrocklin commented Jul 24, 2019 •

edited

Loading

EvenOldridge commented Sep 16, 2020

jrhemstad commented Sep 16, 2020 •

edited

Loading

GregoryKimball commented Dec 13, 2023

[FEA] External libcudf APIs should expose CUDA streams #925

[FEA] External libcudf APIs should expose CUDA streams #925

Comments

jrhemstad commented Feb 12, 2019 • edited Loading

felipeblazing commented Feb 13, 2019

jrhemstad commented Feb 13, 2019 • edited Loading

jrhemstad commented Mar 7, 2019

mrocklin commented Jul 24, 2019 • edited Loading

EvenOldridge commented Sep 16, 2020

jrhemstad commented Sep 16, 2020 • edited Loading

GregoryKimball commented Dec 13, 2023

jrhemstad commented Feb 12, 2019 •

edited

Loading

jrhemstad commented Feb 13, 2019 •

edited

Loading

mrocklin commented Jul 24, 2019 •

edited

Loading

jrhemstad commented Sep 16, 2020 •

edited

Loading