Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Apply functions to columns independently in parallel. #1501

Closed
daxiongshu opened this issue Apr 25, 2019 · 2 comments
Closed

[FEA] Apply functions to columns independently in parallel. #1501

daxiongshu opened this issue Apr 25, 2019 · 2 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@daxiongshu
Copy link

Is your feature request related to a problem? Please describe.
For the dataset with many uncorrelated columns, we want to transform each column independently in parallel. An example is as follows:

# needs better way to replace the for loop.
# number of columns could be large.
for col in df.columns:
    df[col] = some_cudf_transform_function(df[col])

Describe the solution you'd like
Apply the same transform function to a group of columns in parallel, ideally through an API call directly supported by cudf.

Additional context
An example of such property is Santander dataset and the corresponding ETL can be found at cell 5 of the notebook. Independent columns might be resulted from anonymization.

@daxiongshu daxiongshu added Needs Triage Need team to review and classify feature request New feature or request labels Apr 25, 2019
@jrhemstad
Copy link
Contributor

Once libcudf APIs are updated to expose streams via external APIs, I'd hope this should be straightforward from the Python side to identify that these column-wise operations can be done independently and execute them on separate streams.

See #925

@vyasr
Copy link
Contributor

vyasr commented May 10, 2024

More recent efforts have focused on aligning cudf more closely with pandas, not less. This functionality is therefore unlikely to be directly supported in cudf. However, it should be quite easy to enable something like this in pylibcudf (especially using multiple streams), which is probably the way that we will want to go.

@vyasr vyasr closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

4 participants