Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix str dtype -> IntegerDtype conversions #43949

Merged
merged 4 commits into from
Oct 19, 2021
Merged

Conversation

alexreg
Copy link
Contributor

@alexreg alexreg commented Oct 9, 2021

closes #25472

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Oct 9, 2021

Hello @alexreg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-19 12:23:17 UTC

@alexreg alexreg force-pushed the fix-25472 branch 2 times, most recently from dd0d90f to cc12100 Compare October 10, 2021 01:11
@alexreg
Copy link
Contributor Author

alexreg commented Oct 10, 2021

Looks like all 8 failures are the same issue in essence. I'll wait for advice on how to fix that.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove the changes from is_string itself. you can infer inside the integer constructor though.

pandas/core/dtypes/common.py Outdated Show resolved Hide resolved
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Strings String extension data type and string data labels Oct 10, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add tests from each of the issues

@alexreg
Copy link
Contributor Author

alexreg commented Oct 10, 2021

@jreback Thanks for reviewing. I'll have a look at your comments and see what I can do. Where do tests specific to issues go?

@alexreg
Copy link
Contributor Author

alexreg commented Oct 11, 2021

Okay, I think I've addressed the above issues and this solution is better. All tests should be passing now too. I just need to write tests specifically for the two issues this is closing (please point me to where they should go). Note, I'm no longer dealing with #15585, as your comment hinted I shouldn't.

Edit: Looks like all tests passed but the macOS one spuriously gave Error: The operation was canceled..

@alexreg alexreg force-pushed the fix-25472 branch 2 times, most recently from 046efad to db03121 Compare October 11, 2021 00:18
@alexreg
Copy link
Contributor Author

alexreg commented Oct 11, 2021

i think I was trying to target too much in this PR. I'm only going to target #25472 now, which the PR in its current state definitely solves. Just not sue of the best way to write a unit test for that issue. Advice appreciated (or someone can feel free just to do it and push).

if inferred_type in ("string", "unicode"):
# casts from str are always safe since they raise
# a ValueError if the str cannot be parsed into an int
return values.astype(dtype, copy=copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if you dont put this special case here and do the astype on L135? itd be nice to not add the inferred_type arg here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I agree it would be nice not to have that arg there, but nor do I think it's too much of a problem, since this function is only used in one other place (as far as I can tell). If I remove this special case and let L135 always do the cast, it fails because a str -> nullable int cast is technically unsafe (though not really, I think, as per my comment). However, removing casting="safe" from L135 would create all sorts of other problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the issue i have with this is actually passing the inferred keyword, simply use this instead before L240 and would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or i think check values.dtype instead of inferred_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Done.
@jbrockmendel I suppose we can change to that if need be, but I'd already tried @jreback's change, so if that works...

pandas/tests/io/parser/dtypes/test_nullable.py Outdated Show resolved Hide resolved
pandas/tests/io/parser/dtypes/test_nullable.py Outdated Show resolved Hide resolved
@alexreg
Copy link
Contributor Author

alexreg commented Oct 17, 2021

@jreback @jbrockmendel Incorporated / replied to your feedback. Hopefully everything is good to go now.

@jreback jreback added this to the 1.4 milestone Oct 18, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel if comments

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback jreback merged commit c556062 into pandas-dev:master Oct 19, 2021
@jreback
Copy link
Contributor

jreback commented Oct 19, 2021

thanks @alexreg very nice!

@alexreg
Copy link
Contributor Author

alexreg commented Oct 19, 2021

@jreback No problem. Thanks both of you for the reviews and getting this merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv fails with TypeError: object cannot be converted to an IntegerDtype yet succeeds when reading chunks
4 participants