Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fillna to validate for CategoricalColumn.fillna #15683

Merged
merged 3 commits into from
May 7, 2024

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented May 7, 2024

Description

Fixes: #15666

This PR validates values passed to fillna even if there are no null values in a categorical column.

Forks from #14534

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar self-assigned this May 7, 2024
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 7, 2024
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 7, 2024
@galipremsagar galipremsagar marked this pull request as ready for review May 7, 2024 11:37
@galipremsagar galipremsagar requested a review from a team as a code owner May 7, 2024 11:37
@galipremsagar galipremsagar added the 5 - Ready to Merge Testing and reviews complete, ready to merge label May 7, 2024
@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit e87a78d into rapidsai:branch-24.06 May 7, 2024
70 checks passed
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to hit go on this one. There are no really substantive comments here though.

Comment on lines +1079 to +1080
# Validation of `fill_value` will have to be performed
# before returning self.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment suggests that validation of the value must still be performed. But I think it has been in the code above.

So perhaps

Even if there are no nulls, we must validate that the provided value is valid (done above) to match pandas

WDYT?

)
or method is not None
or (
isinstance(col, cudf.core.column.CategoricalColumn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This special-cases categorical columns for fillna both to not care about whether the column has nulls and whether the column name is in the provided dict-like (?) of values.

It so happens that the _is_null_host_scalar check will be true in the latter case, but it's somewhat confusing.

Do you want to just provide a carve-out for categorical columns in the first condition as:

col_name in value
and (col.has_nulls(include_nan=True) or isinstance(col, CategoricalColumn))
and not libcudf.scalar._is_null_host_scalar(replace_val)

?

rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request May 10, 2024
…fillna()` based on latest cuDF changes (#4408)

This handles a [recent cuDF change](rapidsai/cudf#15683) by applying non-dict and non-Series values for a `fillna()` call on `PropertyGraph` instances only to the user-defined columns, with the assumption that savvy users that intend to update the "internal" columns, or users that are aware of their own categorical dtype columns, will use a dict or Series value to properly apply dtypes as needed.

This also updates code in `cugraph-service` that serializes dataframes to numpy bytes to properly convert NA values when categoricals are present.

Notes:
* This is only applied to the SG `PropertyGraph` class.  The MG class needs further review as to how to best apply the same policy (and because there are other MG failing tests that need addressed).  Since this is blocking CI for the SG case only, this PR is being submitted now and MG will be addressed later, which should be okay since `PropertyGraph` is experimental.
* This could be considered a breaking change if `PropertyGraph` was not experimental.

Authors:
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: #4408
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG] Groupby.fillna is not raising when a non-categorical value is passed
3 participants