[FEA] Faster dataframe to cupy conversion when dataframe is a single allocation #12928
Labels
0 - Backlog
In queue waiting for assignment
feature request
New feature or request
Python
Affects Python cuDF API.
When we convert a dataframe to a cupy array, we iterate over each column (as they’re independent allocations) and assign each one to a column in an empty matrix. This means it can be slow for thousands or millions of small columns.
In a select set of circumstances, all of the columns in a DataFrame may be part of a single, contiguous allocation of memory. One scenario in which this can occur is after a call to transpose. It would be nice if, in this scenario, we didn't need to iterate over every column when converting to a cupy array.
A real-world example of when this can matter is if a user is trying to run a dot product after a calling transpose. Because of the bottleneck, we're slower than pandas by quite a bit.
If we were to do any special casing here, we'd want to closely evaluate any impact on performance for the more general case, as the dataframe to cupy codepath is used across the board.
The text was updated successfully, but these errors were encountered: