Reusable Numba extension for CUDA target? #359

shwina · 2020-07-27T21:38:01Z

Greetings, awkward devs!

Over at cudf, we are introducing a ListDtype and an associated ListColumn that is similar to awkard's jagged array - just for use with DataFrames and related operations. We're also looking to introduce other "awkward" column types in the future, such as a StructColumn analogous to Arrow's StructArray.

Something we'd like to be able to do is leverage Numba/CUDA to run user-defined functions (UDFs) on ListColumns -- pretty much exactly what is discussed here.

It seems like some redundancy between cuDF and awkward could be avoided here, by building out the required Numba extensions in a way that's easily reusable by both libraries. More importantly, it would lead to a better experience for users, as the same UDFs would run identically on both awkward arrays and cuDF.

Opening this issue to hear your thoughts about this, and as a place to collect ideas on how this might be achieved. Thanks!

cc: @kkraus14 @gmarkall

The text was updated successfully, but these errors were encountered:

jpivarski · 2020-07-27T22:31:01Z

That's awesome! Composable list and struct types within a DataFrame-like context would really help out physicists (among other data analysts, I'm sure), especially if cuDF can also run without GPUs. (The ability to access data types in this way is useful in itself, even if users don't have access to compatible GPUs on their Macs or in CERN's computing farm.)

This could also help the deprecation I'm considering in #350. Just getting nested data types into Pandas hasn't been useful in itself, since the Pandas API doesn't have operations that know how to make use of them. Presumably, if you're adding these types to the DataFrame in a non-opaque way, then you'll also be adding operations that use them—for example, performing Cartesian products of nested lists or turning struct fields into columns and vice-versa.

Will the buffer backing these new data types simply be an Arrow view? If so, then we can share more than Numba code; it would be possible to apply Awkward's array-at-a-time functions to columns of a cuDF in a similar way that NumPy's can be applied to a Pandas DataFrame. Awkward's own internal representation is more general than Arrow (e.g. ListArray vs ListOffsetArray, a useful distinction when manipulating list structures), and it is zero-copy convertible when equivalent constructs are available (ak.to_arrow and ak.from_arrow).

As for the Numba code, here is where it is located: src/awkward1/_connect/_numba. The data model is based on Awkward's node types, not Arrow, and its primary focus is on lightweight iteration. For that reason, we don't have Numba models for each node type (e.g. ListArray, ListOffsetArray) because these things may be created and destroyed frequently during an iterative loop and Numba's models are by default pass-by-value (and it's not easy to make something pass-by-reference). Copying deep tree structures in every step of iteration would scale poorly.

Instead, our Numba model is an ArrayView that walks over a Lookup data structure. The Lookup is a set of pointers to all the buffers in the original Awkward Array and the ArrayView represents a slice at some level of depth. The way to properly walk over the Lookup is enforced at compile-time, since each node type generates the appropriate code for __getitem__, but that type gets erased at runtime: there's only one type of ArrayView model.

My goal for Awkward-Numba-CUDA would be to reuse most of the infrastructure for Awkward-Numba (because it works) and replace the first level of iteration over a large array with the ability of users to write kernels on a single element of that large array. Walking over lists and structs deeper than the first level would be the same, even though it encourages users to write imperative code that might not be optimal on GPUs (e.g. users might write code with a lot of if-branches, but that would be their mistake to make).

Separating the Awkward-Numba part into a library of its own (whether CUDA-enabled or not) would be a little tricky, given how the ArrayView model was custom-written for Awkward Array types and not Arrow Arrays.

One possibility would be to take the Awkward-Numba implementation as "inspiration" for cuDF.
Another would be to view cuDF's data as Awkward Arrays so that it can use the same Numba models (I think Arrow-to-Awkward is always zero-copy, but I'd have to check).
Yet another would be to refactor the ArrayViews out into a library that only walks over Arrow types, though I wouldn't prefer this option because some of the Awkward types that are not Arrow types, such as our ListArray, would have to be non-zero-copy converted before being passed as an argument to a Numba-compiled function, adding a performance penalty where there currently isn't one.

kkraus14 · 2020-07-28T18:57:47Z

That's awesome! Composable list and struct types within a DataFrame-like context would really help out physicists (among other data analysts, I'm sure), especially if cuDF can also run without GPUs. (The ability to access data types in this way is useful in itself, even if users don't have access to compatible GPUs on their Macs or in CERN's computing farm.)

cuDF only runs on GPUs as of now and there's no plan / roadmap for running on CPUs at this time, but what @shwina proposed here is to make the Numba pieces GPU/CPU agnostic so everyone benefits. Ideally we could live in a world where Arrow, Awkward, cuDF, etc. can all reuse the same Numba extensions.

Will the buffer backing these new data types simply be an Arrow view? If so, then we can share more than Numba code; it would be possible to apply Awkward's array-at-a-time functions to columns of a cuDF in a similar way that NumPy's can be applied to a Pandas DataFrame. Awkward's own internal representation is more general than Arrow (e.g. ListArray vs ListOffsetArray, a useful distinction when manipulating list structures), and it is zero-copy convertible when equivalent constructs are available (ak.to_arrow and ak.from_arrow).

The buffer backing cuDF columns are not Arrow views, but are our own rmm::device_buffer object (http://github.com/rapidsai/rmm). From my perspective though, we'd need to build some type of either abstract base class and/or data / API protocols that someone must implement / conform to in order to be able to utilize the Numba extension. I.E. something similar to Numpy's __array_interface__ (https://numpy.org/doc/stable/reference/arrays.interface.html) or __array_function__ (https://numpy.org/neps/nep-0018-array-function-protocol.html). Then anyone who exposes the necessary interface / layout can take advantage of the extension.

As for the Numba code, here is where it is located: src/awkward1/_connect/_numba. The data model is based on Awkward's node types, not Arrow, and its primary focus is on lightweight iteration. For that reason, we don't have Numba models for each node type (e.g. ListArray, ListOffsetArray) because these things may be created and destroyed frequently during an iterative loop and Numba's models are by default pass-by-value (and it's not easy to make something pass-by-reference). Copying deep tree structures in every step of iteration would scale poorly.

Instead, our Numba model is an ArrayView that walks over a Lookup data structure. The Lookup is a set of pointers to all the buffers in the original Awkward Array and the ArrayView represents a slice at some level of depth. The way to properly walk over the Lookup is enforced at compile-time, since each node type generates the appropriate code for __getitem__, but that type gets erased at runtime: there's only one type of ArrayView model.

My goal for Awkward-Numba-CUDA would be to reuse most of the infrastructure for Awkward-Numba (because it works) and replace the first level of iteration over a large array with the ability of users to write kernels on a single element of that large array. Walking over lists and structs deeper than the first level would be the same, even though it encourages users to write imperative code that might not be optimal on GPUs (e.g. users might write code with a lot of if-branches, but that would be their mistake to make).

The goal of reusing the Numba extension for GPU/CPU is shared among us. I think the new goal we're proposing here is figuring out how to reuse the Numba extension across different projects so we can all contribute to a single place and benefit from each others work.

Separating the Awkward-Numba part into a library of its own (whether CUDA-enabled or not) would be a little tricky, given how the ArrayView model was custom-written for Awkward Array types and not Arrow Arrays.

One possibility would be to take the Awkward-Numba implementation as "inspiration" for cuDF.

Another would be to view cuDF's data as Awkward Arrays so that it can use the same Numba models (I think Arrow-to-Awkward is always zero-copy, but I'd have to check).

Rolling our own implementation for cuDF is the fallback plan, but we have a vested interested in improving the GPU ecosystem 😄. Unfortunately UDFs are pretty important for us to support in cuDF and Awkward is a bit too heavy of a dependency for us to depend on for UDFs.

Yet another would be to refactor the ArrayViews out into a library that only walks over Arrow types, though I wouldn't prefer this option because some of the Awkward types that are not Arrow types, such as our ListArray, would have to be non-zero-copy converted before being passed as an argument to a Numba-compiled function, adding a performance penalty where there currently isn't one.

I think this goes back into not locking us into specific containers to allow for ease of adoption, and instead using protocols and/or abstract classes to handle the Numba extensions.

jpivarski · 2020-12-11T22:46:14Z

As I understand it, cuDF is getting a Numba extension now, but I won't be ready to do this for Awkward Array for months. I'll use a lot of the extensions @gmarkall is adding to Numba-CUDA to implement the Awkward one. The overlap is smaller than we had thought because cuDF's internal data model is strictly Arrow; Awkward Array's is not. As a generalization, there are additional features I have to implement for the Awkward one.

On the other hand, I'm still very interested in interoperability projects in the future!

shwina added the feature New feature or request label Jul 27, 2020

jpivarski added the research label Jul 27, 2020

jpivarski mentioned this issue Oct 5, 2020

Should Awkward Arrays be usable as Pandas columns? #350

Closed

jpivarski closed this as completed Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reusable Numba extension for CUDA target? #359

Reusable Numba extension for CUDA target? #359

shwina commented Jul 27, 2020

jpivarski commented Jul 27, 2020

kkraus14 commented Jul 28, 2020

jpivarski commented Dec 11, 2020

Reusable Numba extension for CUDA target? #359

Reusable Numba extension for CUDA target? #359

Comments

shwina commented Jul 27, 2020

jpivarski commented Jul 27, 2020

kkraus14 commented Jul 28, 2020

jpivarski commented Dec 11, 2020