Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

Closed
isVoid opened this issue Jul 30, 2024 · 1 comment · Fixed by #16436
Closed

[BUG] Slowdown in constructing a cudf dataframe from a numba device array #16434

isVoid opened this issue Jul 30, 2024 · 1 comment · Fixed by #16436
Labels
bug Something isn't working

Comments

@isVoid
Copy link
Contributor

isVoid commented Jul 30, 2024

Describe the bug
Today, if we construct a cudf dataframe from a large numba device array, the construction can be slow.

Steps/Code to reproduce bug

cupy_array = cupy.ones((10_000, 100))
cudf.DataFrame(cupy_array) # fast
cudf.DataFrame(numba.cuda.to_device(cupy_array)) # slow

Expected behavior
At one point, constructing from a numba device array was fast. It should be almost as fast as constructing a cupy array since both supports CAI.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [conda]
@isVoid isVoid added the bug Something isn't working label Jul 30, 2024
@wence-
Copy link
Contributor

wence- commented Jul 30, 2024

Concretely:

import cudf
import cupy
import numba.cuda

N = 10_000

ones = cupy.ones((N, 100))
n_ones = numba.cuda.to_device(ones)

%time cudf.DataFrame(ones);

%time cudf.DataFrame(n_ones);

CPU times: user 10.4 ms, sys: 0 ns, total: 10.4 ms
Wall time: 10.4 ms
CPU times: user 837 ms, sys: 0 ns, total: 837 ms
Wall time: 837 ms

If we increase N to 100_000:

import cudf
import cupy
import numba.cuda

N = 100_000

ones = cupy.ones((N, 100))
n_ones = numba.cuda.to_device(ones)

%time cudf.DataFrame(ones);

%time cudf.DataFrame(n_ones);
CPU times: user 15.7 ms, sys: 0 ns, total: 15.7 ms
Wall time: 15.7 ms
CPU times: user 7.2 s, sys: 240 ms, total: 7.44 s
Wall time: 7.44 s

It looks like slicing a numba device array if the result is not C or F contiguous produces code that is linear in the non-sliced axis.

If we are F-contiguous then things are fine:

import cudf
import cupy
import numba.cuda

N = 10_000

ones = cupy.ones((100, N)).T
n_ones = numba.cuda.to_device(ones)

%time cudf.DataFrame(ones);

%time cudf.DataFrame(n_ones);
CPU times: user 3.27 ms, sys: 0 ns, total: 3.27 ms
Wall time: 3.28 ms
CPU times: user 11.6 ms, sys: 0 ns, total: 11.6 ms
Wall time: 11.6 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants