Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Adjust the codebase to the new np.array's copy keyword meaning #57172

Merged
merged 8 commits into from
Mar 5, 2024

Conversation

mtsokol
Copy link
Contributor

@mtsokol mtsokol commented Jan 31, 2024

Hi!

This PR addresses changes planned for NumPy in numpy/numpy#25168 (new copy keyword for np.asarray and np.array).

np.array(..., copy=False) will now throw and exception if a copy is needed. To retain the same behavior np.asarray(...) can be used, so a copy is made only when needed.

I applied changes in a backward-compatible way, so that no NumPy version check & branching is needed.

@mtsokol mtsokol force-pushed the np-array-copy-keyword branch 2 times, most recently from 8df14c7 to 467f107 Compare January 31, 2024 14:24
@mroeschke mroeschke added the Compat pandas objects compatability with Numpy or Python functions label Jan 31, 2024
@mtsokol mtsokol force-pushed the np-array-copy-keyword branch 2 times, most recently from a80d828 to f2e8246 Compare February 1, 2024 11:02
@mroeschke mroeschke added this to the 2.2.1 milestone Feb 1, 2024
@mroeschke
Copy link
Member

If you have time could you investigate making test_array_interface compatible with older versions of numpy?

@lithomas1 lithomas1 modified the milestones: 2.2.1, 2.2.2 Feb 23, 2024
@mtsokol mtsokol force-pushed the np-array-copy-keyword branch from f2e8246 to 3ae782d Compare February 29, 2024 16:51
@mtsokol
Copy link
Contributor Author

mtsokol commented Feb 29, 2024

numpy/numpy#25168 got merged today - once a new nightly build is published, all tests should pass now.

@mtsokol mtsokol force-pushed the np-array-copy-keyword branch 2 times, most recently from 3903437 to 6b28ead Compare March 1, 2024 11:49
@mtsokol
Copy link
Contributor Author

mtsokol commented Mar 1, 2024

NumPy nightly build disappeared (numpy/numpy#25907) but once it's back this PR should be ready.

@matthewfeickert
Copy link

@mtsokol Please retrigger the CI given numpy/numpy#25907 (comment).

@mtsokol mtsokol force-pushed the np-array-copy-keyword branch 4 times, most recently from 32dcf9c to 1a5ec4e Compare March 4, 2024 15:37
@mtsokol mtsokol force-pushed the np-array-copy-keyword branch from 1a5ec4e to 9f93fa4 Compare March 4, 2024 16:19
@mtsokol
Copy link
Contributor Author

mtsokol commented Mar 4, 2024

Hi! I updated the PR with all required changes to be compatible with NumPy dev - now Numpy Dev CI job passes.

My question: There's a new copy keyword required in the __array__ protocol implementations. Would you like to keep this keyword unused (as dtype is sometimes ignored in the codebase) or do you want to actually make use of it internally where it makes sense?

@mtsokol mtsokol requested a review from mroeschke March 4, 2024 16:49
@mroeschke
Copy link
Member

Would you like to keep this keyword unused (as dtype is sometimes ignored in the codebase) or do you want to actually make use of it internally where it makes sense?

Yes we would probably want to make use of it, but that can be done in a follow up

@mroeschke
Copy link
Member

Looks like there's some typing validations with the new addition of the keyword. Let me know if you'd like help with them: https://github.com/pandas-dev/pandas/actions/runs/8143484703/job/22255440584

@mtsokol
Copy link
Contributor Author

mtsokol commented Mar 5, 2024

Looks like there's some typing validations with the new addition of the keyword...

These errors look pretty tricky. I guess it's due to the fact that this job installed NumPy 1.x where copy signature is bool | _CopyMode. Locally I see that installing Numpy 2.0dev yields even more typing errors in other places (like error: Name "np.bool_" is not defined as bool_ typing stub got renamed to bool).

So for this PR, maybe # type: ignore for these calls would work?

error: Argument "copy" to "array" has incompatible type "bool | None"; expected "bool | _CopyMode"  [arg-type]

Comment on lines 107 to 108
copy_false = None if np_version_gt2 else False
result = np.array(result, copy=copy_false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
copy_false = None if np_version_gt2 else False
result = np.array(result, copy=copy_false)
result = np.asarray(result)

Would this be a simpler but equivalent alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it (from a quick glance at the diff).

I just added this to the 2.0 migration guide:

  1. Code using np.array(..., copy=False) can in most cases be changed to
    np.asarray(...). Older code tended to use np.array like this because
    it had less overhead than the default np.asarray copy-if-needed
    behavior. This is no longer true, and np.asarray is the preferred function.

@mroeschke mroeschke merged commit b89f1d0 into pandas-dev:main Mar 5, 2024
47 checks passed
Copy link

lumberbot-app bot commented Mar 5, 2024

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.2.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 b89f1d0d05f4c9f360985abc6bda421d73bae85f
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am "Backport PR #57172: MAINT: Adjust the codebase to the new `np.array`'s `copy` keyword meaning"
  1. Push to a named branch:
git push YOURFORK 2.2.x:auto-backport-of-pr-57172-on-2.2.x
  1. Create a PR against branch 2.2.x, I would have named this PR:

"Backport PR #57172 on branch 2.2.x (MAINT: Adjust the codebase to the new np.array's copy keyword meaning)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@mroeschke
Copy link
Member

Thanks @mtsokol. Will create a follow up issue about utilizing the new copy keyword in __array__

mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Mar 5, 2024
mroeschke added a commit that referenced this pull request Mar 6, 2024
…meaning (#57740)

Co-authored-by: Mateusz Sokół <8431159+mtsokol@users.noreply.github.com>
@mtsokol mtsokol deleted the np-array-copy-keyword branch March 6, 2024 08:14
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…ning (pandas-dev#57172)

* MAINT: Adjust the codebase to the new np.array copy keyword meaning

* Add copy is docstring

* Use asarray where possible

---------

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants