Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure dtype objects are passed within Column.astype #17978

Merged
merged 43 commits into from
Mar 7, 2025

Conversation

mroeschke
Copy link
Contributor

Description

Continuation of #17918

  • Some modified astype calls might be from Series/Index/etc., but IMO it's OK if we are a bit stricter and pass dtype objects when calling those methods too
  • Added some stricter typings to Column.as_*_column since they are called by Column.astype

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 11, 2025
@mroeschke mroeschke self-assigned this Feb 11, 2025
@mroeschke mroeschke requested a review from a team as a code owner February 11, 2025 02:06
@mroeschke
Copy link
Contributor Author

I broke off some changes into #18008 as I think there might need to be some bug/feature implementations needed to make dtype objects required here

rapids-bot bot pushed a commit that referenced this pull request Feb 18, 2025
Broken off from #17978

This passes more dtype objects to `Column.astype` but does not _ensure_ that a dtype object is passes unlike in #17978

Also it appears we avoid surfacing from warnings from pandas within our tests.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #18008
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this PR has undergone a large number of changes since I approved, so I'm just going to block for the moment. Please let me know when you want another review.

rapids-bot bot pushed a commit that referenced this pull request Feb 25, 2025
closes #17997

Will help unblock #17978 where we will need to interpret `dtype="interval"` as "interval without a subtype" instead of "interval with float64 subtype"

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #18017
rapids-bot bot pushed a commit that referenced this pull request Feb 27, 2025
Partially broken off from #17978

Since Column objects are technically private, not marking this as breaking.

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #18066
@mroeschke
Copy link
Contributor Author

@vyasr this PR is a ready for another round of review when you have the time

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was very confused about when ordered could be None for categoricals when reviewing, but I guess that's what you do cudf.dtype("category") because that's what pandas gives you in that case. Does this really make much sense? I guess any other choice would be equally arbitrary, but it seems worse to me for pandas to allow ordered to be None than to make an arbitrary choice of ordered=True or False when you construct an empty one.

Nevertheless, this PR is fine. It needs conflict resolution though.

@mroeschke
Copy link
Contributor Author

Does this really make much sense?

Yeah I don't like that None is distinct from True/False either. I did find this comment pandas that describe the purpose of None which is only necessitated really by the fact that one can specify "category" as a dtype.

        # need ordered=None to ensure that operations specifying dtype="category" don't
        # override the ordered value for existing categoricals

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit a3b1f25 into rapidsai:branch-25.04 Mar 7, 2025
108 checks passed
@mroeschke mroeschke deleted the cln/dtype/astype branch March 7, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants