Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API / CoW: Copy arrays by default in Series constructor #52022

Merged
merged 12 commits into from
Mar 29, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 16, 2023

will update the whatsnew from the other copy=True pr

@phofl phofl added Series Series data structure Copy / view semantics labels Mar 16, 2023
@phofl phofl added this to the 2.0 milestone Mar 16, 2023
@@ -394,6 +401,11 @@ def __init__(
self.name = name
return

if isinstance(data, (ExtensionArray, np.ndarray)):
if default_cow_copy and not copy and using_copy_on_write():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this difficult to reason about (... in part bc im distracted by being on a call). if a user explicitly passes copy=False, we should never ignore that. Does this (or any of the other CoW-constructor PRs that im having trouble keeping track of) respect that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass an array then we respect copy=False with the drawback that your series gets modified when you modify the array, if you pass a Series/DataFrame we make a lazy copy to set up references

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the reason why I changed the default to None, so that we can respect a user actually passing False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't find it easy to read, but I think it can be simplified?
Wouldn't it be enough to check for if copy was originally None? Because if copy=True, I would expect the copy to happen anyway later on (assuming it currently honored that keyword)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid using_copy_on_write before hitting the fastpath, but only using copy is certainly more readable, changed

@jorisvandenbossche jorisvandenbossche changed the title API-CoW: Copy arrays in Series constructor API / CoW: Copy arrays by default in Series constructor Mar 17, 2023
"arr", [np.array([1, 2, 3], dtype="int64"), pd.array([1, 2, 3], dtype="Int64")]
)
def test_series_from_array(using_copy_on_write, idx, dtype, fastpath, arr):
ser = Series(arr, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should idx and fastpath be passed here as well??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thx, yes

# Conflicts:
#	pandas/tests/copy_view/test_astype.py
#	pandas/tests/copy_view/test_constructors.py
# Conflicts:
#	pandas/tests/copy_view/test_astype.py
#	pandas/tests/copy_view/test_constructors.py
Comment on lines 383 to 387
if copy is None:
if using_copy_on_write():
copy = True
else:
copy = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could easily move this below the first fastpath if block, when updating that to check for copy is None instead of copy is False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

@phofl
Copy link
Member Author

phofl commented Mar 29, 2023

merging this to be consistent with the DataFrame constructor

@phofl phofl merged commit d95fc0b into pandas-dev:main Mar 29, 2023
@phofl phofl deleted the cow_series_from_array branch March 29, 2023 16:40
@lumberbot-app
Copy link

lumberbot-app bot commented Mar 29, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 d95fc0b17d32b490000277e0f9a10e2de2c63a8d
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #52022: API / CoW: Copy arrays by default in Series constructor'
  1. Push to a named branch:
git push YOURFORK 2.0.x:auto-backport-of-pr-52022-on-2.0.x
  1. Create a PR against branch 2.0.x, I would have named this PR:

"Backport PR #52022 on branch 2.0.x (API / CoW: Copy arrays by default in Series constructor)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

mroeschke pushed a commit that referenced this pull request Mar 29, 2023
… in Series constructor) (#52282)

API / CoW: Copy arrays by default in Series constructor (#52022)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Series Series data structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API / CoW: copy/view behaviour when constructing DataFrame/Series from a numpy array
3 participants