-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable expression-based Dask Dataframe support #4325
Changes from all commits
b3e0fba
39f5101
4ac1828
41d42f6
9074cab
0f8e221
436a080
da82bcd
a9ae6f2
f2d6a25
43963ad
f9bcbe6
bf9dd10
c104292
03bbee5
952b224
9496171
d807de0
94c8d79
942a396
36df87d
27ecbaf
68b798b
ec0fcd4
acbc219
541c8f2
8dc5804
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,10 +3,6 @@ | |
|
||
set -euo pipefail | ||
|
||
# TODO: Enable dask query planning (by default) once some bugs are fixed. | ||
# xref: https://github.com/rapidsai/cudf/issues/15027 | ||
export DASK_DATAFRAME__QUERY_PLANNING=False | ||
|
||
Comment on lines
-6
to
-9
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AIUI this is one of the OPS relevant changes. Basically removing a workaround that is no longer needed |
||
# Support invoking test_python.sh outside the script directory | ||
cd "$(dirname "$(realpath "${BASH_SOURCE[0]}")")"/../ | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,10 +3,6 @@ | |
|
||
set -eoxu pipefail | ||
|
||
# TODO: Enable dask query planning (by default) once some bugs are fixed. | ||
# xref: https://github.com/rapidsai/cudf/issues/15027 | ||
export DASK_DATAFRAME__QUERY_PLANNING=False | ||
|
||
Comment on lines
-6
to
-9
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the other one. So same change as before just in another place |
||
package_name=$1 | ||
package_dir=$2 | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2020-2023, NVIDIA CORPORATION. | ||
# Copyright (c) 2020-2024, NVIDIA CORPORATION. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
|
@@ -15,8 +15,8 @@ | |
|
||
from collections.abc import Sequence | ||
from collections import OrderedDict | ||
from dask_cudf.core import DataFrame as dcDataFrame | ||
from dask_cudf.core import Series as daskSeries | ||
from dask_cudf import DataFrame as dcDataFrame | ||
from dask_cudf import Series as daskSeries | ||
|
||
Comment on lines
-18
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NOTE: All imports from |
||
import cugraph.dask.comms.comms as Comms | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -99,13 +99,13 @@ def test_nodes_functionality(dask_client, input_combo): | |
expected_nodes = ( | ||
dask_cudf.concat([ddf["src"], ddf["dst"]]) | ||
.drop_duplicates() | ||
.to_frame() | ||
.sort_values(0) | ||
.to_frame(name="0") | ||
.sort_values("0") | ||
Comment on lines
-102
to
+103
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another "precaution" (using numerical column names still seems "fragile" in dask) |
||
) | ||
|
||
expected_nodes = expected_nodes.compute().reset_index(drop=True) | ||
|
||
result_nodes["expected_nodes"] = expected_nodes[0] | ||
result_nodes["expected_nodes"] = expected_nodes["0"] | ||
|
||
compare = result_nodes.query("result_nodes != expected_nodes") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from_dask_dataframe
is now deprecated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that dask-expr has some dispatching/plugin support for different DataFrame implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - Dask documentation is here: https://docs.dask.org/en/latest/how-to/selecting-the-collection-backend.html, and Dask-cudf: https://docs.rapids.ai/api/dask-cudf/stable/#dataframe-creation-from-in-memory-formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: Given the complexity of dask's various dispatching mechanisms, I'm not expecting anything other than "pandas" and "cudf" the ever be implemented - Though it's technically possible.