-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: possible performance regression in points_in_polygon() #1413
Comments
@trxcllnt recently modified point_in_polygon. Could those changes have caused this? |
Are you referring to #1381? It could be related, but I don't think it'd be the root cause by itself. Those changes were made 2+ months ago, and as recently as #1404 (2 weeks ago), the |
Also that PR modified the quadtree PiP algo, but the algo in question here is the non-quadtree version. |
I did some profiling using pyspy. This is not a complete profile, I have just been running for about 4.5 minutes using
Nearly all the time is spent in Numba. I used |
@mroeschke since you have touched a lot of places in cuSpatial and cuDF recently can you tell us if this code perhaps is now running in numba but didn't used to? That could explain the huge performance regression we are seeing. |
…16436) #16277 removed a universal cast to a `cupy.array` in `_from_array`. Although the typing suggested this method should only accept `np.ndarray` or `cupy.ndarray`, this method is called on any object implementing the `__cuda_array_inferface__` or `__array_interface__` (e.g. `numba.DeviceArray`) which caused a performance regression in cuspatial rapidsai/cuspatial#1413 closes #16434 ```python In [1]: import cupy, numba.cuda In [2]: import cudf In [3]: cupy_array = cupy.ones((10_000, 100)) In [4]: %timeit cudf.DataFrame(cupy_array) 3.88 ms ± 52 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [5]: %timeit cudf.DataFrame(numba.cuda.to_device(cupy_array)) 3.99 ms ± 35.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` --------- Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Version
24.08
On which installation method(s) does this occur?
Conda
Describe the issue
See the write-up at #1407 (comment).
Since around July 12, 2024, the
nyc_taxi_years_correlation.ipynb
started taking several hours to complete (on v24.08, using 24.08cudf
and other RAPIDS nightlies). Prior to that, on the exact same hardware, it completed in under 8 minutes.I was able to reproduce this interactively, on a machine with 8 V100s and CUDA 12.2.
I strongly suspect that this indicates a performance regression, maybe of the form "some change(s) in
cudf
cause acuspatial
codepath that could previously execute on the GPU to fall back to the CPU", although I don't have profiling output to provide as evidence.Minimum reproducible example
From #1407 (comment).
Download the input data.
Then, in a Python 3.11 session (with v24.08 of cuspatial and all its RAPIDS dependencies).
That one combination of polygons completed successfully, but took 21 to complete. It's the 2
points_in_polygon()
calls that took around 20 of those 21 minutes.And in the notebook, 10 such combinations are processed.
cuspatial/notebooks/nyc_taxi_years_correlation.ipynb
Lines 168 to 169 in c8616c1
cuspatial/notebooks/nyc_taxi_years_correlation.ipynb
Lines 207 to 209 in c8616c1
So conservatively, it might take 3.5 hours for the notebook to finish in my setup. And that's making a LOT of assumptions.
Relevant log output
N/A
Environment details
Both these environments:
conda-notebooks-tests
environment with V100s (CUDA 12.2)Using
cudf
(and other RAPIDS dependencies) nightly conda packages as of July 12, 2024.output of 'conda info', 'conda env export', and 'nvidia-smi' (click me)
(example build link)
Other/Misc.
Other symptoms that led to this were documented in #1406.
That was closed by just skipping the most expensive notebooks, in #1407.
The text was updated successfully, but these errors were encountered: