-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix isin with nans and large arrays #36266
BUG: fix isin with nans and large arrays #36266
Conversation
Seems tests are failing, but it seems unrelated to this? |
Yes, those are unrelated |
Thanks @Hanspagh, can you add a release note for 1.1.3? |
Changes as requested, let mere know if there is anything else needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls also merge master, ping when green.
0ac189c
to
b31f4e2
Compare
Updated as requested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, ping on green.
49aeb2d
to
7f3d217
Compare
Fixed |
7f3d217
to
3679c14
Compare
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hanspagh minor nit re consistency of issue number comments
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
I can do a rebase if needed? |
git pull will normally get the changes from the commit suggestion locally. to update the PR with the latest changes on master see https://pandas.pydata.org/pandas-docs/dev/development/contributing.html#updating-your-pull-request |
thanks @Hanspagh |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
#36385 (comment) could have been responsible for not being able to auto backport |
Co-authored-by: Hans <hanspagh@gmail.com>
@@ -440,7 +440,12 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray: | |||
# GH16012 | |||
# Ensure np.in1d doesn't get object types or it *may* throw an exception | |||
if len(comps) > 1_000_000 and not is_object_dtype(comps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry a little bit too late. The whole point of doing this for len(comps) > 1_000_000, was that numpy was deemed to be faster (which is probably no loner the case btw, see #22205 (comment)), adding any
, isnan
, logical_or
on top (with all the cache misses and temporary objects) will make this branch much slower. So probably it is best just to drop the whole branch and always keep f = htable.ismember_object
(unless it is is_integer_dtype
of cause).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u run the asvs and check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a np.isnan if nan is given to isin and we have a large enough array to trigger the
np.in1d
pathblack pandas
git diff upstream/master -u -- "*.py" | flake8 --diff