API/DEPR: Deprecate inplace parameter #16529

gfyoung · 2017-05-29T01:05:28Z

The parameter inplace=False should be deprecated across the board in preparation for pandas 2, which will not support that input (we will always return a copy). That would give people time to stop using it.

Thoughts?

Methods using inplace:

Deprecation non controvertial (a copy will be made anyway, and inplace=True does not add value):

(Series/DataFrame).drop
(Series/DataFrame).drop_duplicates
(Series/DataFrame).dropna
DataFrame.set_index (with drop=False wouldn't change the data, but that doesn't seem the main use case)
DataFrame.query
DataFrame.eval

Not sure:

(Series/DataFrame).sort_values
(Series/DataFrame).sort_index

Should be able to not copy memory (under discussion on what to do):

(Series/DataFrame).clip
(Series/DataFrame).where
(Series/DataFrame).fillna
(Series/DataFrame).rename_axis
(Series/DataFrame).reset_index
(Series/DataFrame).replace
(Series/DataFrame).set_axis
(Series/DataFrame).mask
(Series/DataFrame).interpolate
DataFrame.rename
Index.rename
Index.set_names
MultiIndex.set_levels
MultiIndex.set_labels
pandas.core.resample.Resampler.interpolate

Special cases:

pandas.eval (with inplace=False the value is not returned but set to an argument target)

The text was updated successfully, but these errors were encountered:

drafter250 · 2017-07-26T16:27:28Z

I thought The main usage of this feature was to mitigate memory usage in case of large DataFrames?

gfyoung · 2017-07-26T16:50:42Z

I thought The main usage of this feature was to mitigate memory usage in case of large DataFrames?

I think @jreback might know more about the initial usage, but generally from previous discussion, this parameter is misused and is more prone to introducing bugs.

jreback · 2017-08-22T01:02:49Z

inplace does not generally do anything inplace
but makes a copy and reassigns the pointer

mutating code is much harder to debug (not to mentiin more complicated to support actual inplace ops)

so except for inplace indexing generally these operations are simply more verbose and serve just to provide corner cases for testing

datapythonista · 2018-12-03T00:16:18Z

@pandas-dev/pandas-core I think this was agreed during the sprint at Scipy, but I'm not sure if it was discussed when to deprecate the inplace parameters. Is it something we want to do before 1.0?

Personally I think the sooner, the better, since the decision is made.

jreback · 2018-12-03T00:35:28Z

this is going to cause major downstream complaints, but I agree this should be sooner rather than later.

Would accept a FutureWarning for this for 0.24.0

TomAugspurger · 2018-12-03T12:53:41Z

A few questions:

There are some places (.where comes to mind, fillna? Maybe others?) where the operation can truly be done inplace. I don't think those should be deprecated.
Following 1. is it possible to detect when we're doing an "inplace" operation on a copy? Could we only warn in those cases?

jreback · 2018-12-03T13:02:01Z

There are some places (.where comes to mind, fillna? Maybe others?) where the operation can truly be done inplace. I don't think those should be deprecated.

this is a very limited number, and though possible to detect (e.g. a Series of a single dtype, or a DataFrame of a single dtype), IMHO is not worth leaving this option. just adding code complexity w/o any real perf benefit. So -1 on doing this.

datapythonista · 2018-12-03T13:06:29Z

In my opinion, ideally we should always do the operation in place if possible, but still return a new object.

So, df = df.fillna(0) wouldn't copy memory (assuming no type change), and if the user does want to modify a copy, they should do df2 = df.copy().fillna(0).

Not sure if in practice can be complex to implement in some cases. But if that makes sense, we should deprecate all inplace arguments.

jreback · 2018-12-03T13:13:36Z

So, df = df.fillna(0) wouldn't copy memory (assuming no type change), and if the user does want to modify a copy, they should do df2 = df.copy().fillna(0).

this is too complex. You either do it inplace or you don't. The keyword controls it. If we remove the keyword then it should never modify the original.

TomAugspurger · 2018-12-03T16:02:08Z

just adding code complexity w/o any real perf benefit.

It depends on the application, but not having to copy can be a pretty big win, right? :)

Still, I agree that this is tough (impossible) to detect ahead of time. Is it feasible to detect it after the fact, and raise a PerformanceWarning in those cases (possibly elevating the an error later, if that makes sense)? In a simplified BlockManager world (one block per column) we could maybe keep a weakref from the column to the actual data (ndarray or the EA's data).

def fillna(self, **kwargs):
    # in BlockManager.fillna
    ref_to_data = self._get_ref_to_data()
    result = self.apply('fillna', **kwargs)
    new_data = self._get_ref_to_data()
    if ref_to_data != new_data and inplace:
         warnings.warn(PerformanceWarning)

If so, I would prefer to go that route, rather than having users change code.

jreback · 2018-12-03T16:04:10Z

I am really -1 on this in any users code. So while this may have to be an extended deprecation cycle I think its worth it.

datapythonista · 2018-12-03T16:55:42Z

This is the list of methods functions in the public API with a parameter named inplace.

EDITED: Moved the lists to the description

Those are not really in place, and should be deprecated:

[EDIT: moved to the issue description]

If that sounds good, I'll create a PR to raise FutureWarning for the ones that we all agree should be deprecated (the first list). Then we can follow up with the rest.

Let me know if there are more from the first list that you want to postpone to the second phase.

@jorisvandenbossche may be you also want to take a look here.

h-vetinari · 2018-12-03T17:29:24Z

@datapythonista @jreback
Another method that should be part of this discussion is .update, which is AFAICT the only method that's inplace by default, and does not even have an inplace-kwarg. Adding that has met resistance in #22286, but one could deprecate the whole method and replace it with df.coalesce: #22812

TomAugspurger · 2018-12-03T17:37:57Z

@datapythonista how'd you define that list of methods that are not really inplace? I haven't looked closely, but things like Series.rename_axis should be doable without copying data. I believe clip / clip_lower as well.

I am really -1 on this in any users code.

So that argument is independent of whether or not any operation can be inplace, and we should discuss that. i.e. that it's the "opinion of pandas" that inplace is an anti-pattern to be avoided at all time.

Personally, I'm not sure about that.

jreback · 2018-12-03T17:42:09Z

my point is this adds a lot of complexity
sure it’s opionated but so what

complexity is killing the ability of most folks to make modifications

this simplifies the model a great deal

datapythonista · 2018-12-03T17:42:25Z

I just started the list of the actual inplace methods with the ones you said. There are some I can guess they should be able to be in place, like replace, but thought you or Jeff would know from the top of your head, so didn't want to guess.

Will move the lists to the description, and try to get it closer to reality, feel free to edit afterwards.

MarcoGorelli · 2022-12-02T19:18:32Z

Hi @jorisvandenbossche

It's been 1.5 years since you wrote

but based on my initial experiments with Copy-on-Write (#36195, no PR yet), I am no longer fully convinced that completely deprecating the inplace keyword is the way to go.

Just wondering if your stance has changed at all

jorisvandenbossche · 2023-02-17T22:42:02Z

A formal proposal to deprecate most occurrences of the inplace keyword (and the copy keyword as well) has been opened at #51466

jondo · 2023-02-18T10:45:55Z

Here's the direct link to the proposal.

jorisvandenbossche · 2023-12-05T10:44:07Z

FWIW, the proposal to remove the inplace keyword in certain methods is being finalized (#51466)

Rinfore · 2023-12-06T08:59:18Z

Is there any way to keep the illusion of a mutating object with a pandas extension? https://pandas.pydata.org/docs/dev/development/extending.html

jorisvandenbossche · 2023-12-06T09:15:40Z

@Rinfore can you clarify your question a bit? (pandas objects are still mutable, removing the inplace keyword in methods that didn't actually work inplace won't change that)

If you want some syntactic sugar to avoid having to re-assign to the same variable (like df = df.reset_index() instead of df.reset_index(inplace=True)), you can't do achieve that with an extension type, but I think you could in theory do that with an accessor (it would look something like df.inplace.reset_index(), if the accessor API allows to modify the calling object).
I am not sure if I would recommend doing that, though ;)

Rinfore · 2023-12-06T11:28:15Z

Thanks so much for the insight @jorisvandenbossche!

I've been working on creating some custom accessors in a domain-specific library that allow to add abstractions with embedded business logic to data frames (e.g. fictitious example: based on presence of columns: [EmployeeId, Time, Event], I can classify the data-frame as a EmployeeRecords data frame) and use extension types on it (df.employeerecords.<method>). This allows users to treat them almost like instances of EmployeeRecords classes, but retain the flexibility to dive into the Pandas API.

In my custom accessors, I may do things such as filter out problematic rows via higher-level functions (e.g. df.employeerecords.drop_inconsistent_records()). Previously, by dropping rows in-place, I could avoid having my users do something like df = df.employeerecords.drop_inconsistent_records() so I was wondering if it would be possible to continue avoiding this (exactly as you mention, syntactic sugar).

I am also wondering if mutating a data frame to add columns (e.g. df['inconsistent'] = ...), or other similar operations, would become impossible in the future, which would be of significant concern to me, as my custom accessors rely heavily on things such as mutating the data frame to add columns or mutate other state.

In summary:

I was wondering whether I could use custom accessors to achieve the syntactic sugar you mentioned, which seemed to be 'maybe' based on your reply above? I can't see how this is possible though, as assigning a new data frame to self._obj in the accessor does not replace the reference to the original object.

e.g.

@pd.api.extensions.register_dataframe_accessor('test')
class TestAccessor:

    def __init__(self, df: pd.DataFrame) -> None:
        self._df = df

    def test_assign(self):
        copied_df = self._df.copy()
        copied_df.columns = ['new_df']
        self._df = copied_df
        
>>> test_df = pd.DataFrame({'old_df': [0]})
>>> test_df.test.test_assign()
>>> test_df.columns
Index(['old_df'], dtype='object') # not Index(['new_df'], dtype='object')

It appears as if the internal assignment in the custom accessor does nothing to modify the reference in the enclosing environment. Thus, I would likely need to return the new reference and have users update their variable references.

Is it likely that mutating pandas objects via operations [], .index = , etc. would be disallowed in the future? (I'm hoping not...)

I apologise if this is not the correct forum to pose these queries, or if I have made some other error or slight in my post.

pandas-dev/pandas#51466 pandas-dev/pandas#16529

kapoor1992 · 2024-03-18T16:16:26Z

7 years since this was opened, is this still something we want to do?

phofl · 2024-03-19T20:28:32Z

This is actually covered by PDEP8, I guess we can close here

This resolves a few warnings related to accessing things. Removing the `inplace=True` qualifier seems undesirable despite the warning being emitted for it, but it turns out that generally when this is given pandas is creating a copy internally anyway. pandas seem to have a general desire to remove the `inplace=` options everywhere so that code is not mutating, but remains more immutable. See the following comment from years ago. pandas-dev/pandas#16529 (comment)

the inplace version of pandas.fillna apparently shouldn't be used see pandas-dev/pandas#16529

gfyoung changed the title ~~API/DEPR: Deprecate inplace operations~~ API/DEPR: Deprecate inplace parameter May 29, 2017

gfyoung mentioned this issue May 29, 2017

rename_axis inplace=True #15704

Closed

jreback added API Design Deprecate Functionality to remove in pandas labels May 30, 2017

jreback modified the milestones: 0.21.0, 1.0 May 30, 2017

gfyoung mentioned this issue Jun 20, 2017

BUG/MAINT: Change default of inplace to False in pd.eval #16732

Merged

This was referenced Jul 27, 2017

Variable deletion consumes a lot of memory #17092

Closed

Bug: groupby multiindex levels equals rows #16859

Merged

gfyoung mentioned this issue Aug 23, 2017

inplace argument has no effect for rename_axis() #17317

Closed

jreback mentioned this issue Oct 28, 2017

API: make inplace= arguments take their default value from a global option #17968

Closed

nmusolino mentioned this issue Dec 2, 2018

DOC: Improve "inplace" param in Dataframe.rename() docs #24037

Closed

4 tasks

datapythonista mentioned this issue Dec 3, 2018

WIP/DEPR: Deprecate inplace=True #24063

Closed

4 tasks

Batalex mentioned this issue Jan 15, 2023

Pynteny: a Python package to perform synteny-aware, profile HMM-based searches in sequence databases pyOpenSci/software-submission#67

Closed

24 tasks

Robaina mentioned this issue Jan 25, 2023

PyOpenSci REVIEW - minor updates Robaina/Pynteny#43

Merged

mcrumiller mentioned this issue Feb 13, 2023

feat(python): add include_index option on init from pandas frames pola-rs/polars#6847

Merged

sowdm mentioned this issue Feb 21, 2023

Remove usage of inplace and copy in usage of pandas openpolicedata/openpolicedata#163

Open

tkoyama010 mentioned this issue Mar 19, 2023

Deprecate inplace parameter pyvista/pyvista#4158

Closed

mroeschke mentioned this issue May 11, 2023

Add inplace option to DataFrame.update() #10730

Closed

jorisvandenbossche linked a pull request Nov 13, 2023 that will close this issue

PDEP-8: Inplace methods in pandas #51466

Open

slopez-b mentioned this issue Dec 21, 2023

Chemometic Nucleoview: support for non-numeric values Benchling-Open-Source/allotropy#213

Merged

bwolfe-benchling mentioned this issue Jan 11, 2024

Nanodrop eight support Benchling-Open-Source/allotropy#230

Merged

ZelenyMartin added a commit to Czechitas-Python/python-data-1 that referenced this issue Jan 23, 2024

Deprecate inplace parameter

72d1c72

pandas-dev/pandas#51466 pandas-dev/pandas#16529

ZelenyMartin mentioned this issue Jan 23, 2024

Deprecate inplace parameter Czechitas-Python/python-data-1#88

Merged

ZelenyMartin added a commit to Czechitas-Python/python-data-1 that referenced this issue Jan 23, 2024

Deprecate inplace parameter

a077da7

pandas-dev/pandas#51466 pandas-dev/pandas#16529

alifbe mentioned this issue Jan 30, 2024

Switch to ruff equinor/pyscal#430

Merged

phofl added the Closing Candidate May be closeable, needs more eyeballs label Mar 19, 2024

ahuang11 mentioned this issue Jul 8, 2024

Reset index holoviz/holoviz#416

Merged

zargot added a commit to Big-Life-Lab/PHES-ODM-sharing that referenced this issue Jul 18, 2024

fix deprecation warning from pandas

e1e9ca2

the inplace version of pandas.fillna apparently shouldn't be used see pandas-dev/pandas#16529

PeterHamfelt mentioned this issue Aug 20, 2024

add-pandas-inplace-parameter-set-checker pylint-dev/pylint-ml#26

Closed

nilchia mentioned this issue Aug 30, 2024

Update Scanpy to 1.10, preparing for GTA galaxyproject/tools-iuc#6162

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API/DEPR: Deprecate inplace parameter #16529

API/DEPR: Deprecate inplace parameter #16529

gfyoung commented May 29, 2017 •

edited by jorisvandenbossche

Loading

drafter250 commented Jul 26, 2017

gfyoung commented Jul 26, 2017

jreback commented Aug 22, 2017

datapythonista commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018 •

edited by MarcoGorelli

Loading

h-vetinari commented Dec 3, 2018 •

edited

Loading

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018

MarcoGorelli commented Dec 2, 2022

jorisvandenbossche commented Feb 17, 2023

jondo commented Feb 18, 2023

jorisvandenbossche commented Dec 5, 2023

Rinfore commented Dec 6, 2023

jorisvandenbossche commented Dec 6, 2023

Rinfore commented Dec 6, 2023

kapoor1992 commented Mar 18, 2024

phofl commented Mar 19, 2024

API/DEPR: Deprecate inplace parameter #16529

API/DEPR: Deprecate inplace parameter #16529

Comments

gfyoung commented May 29, 2017 • edited by jorisvandenbossche Loading

drafter250 commented Jul 26, 2017

gfyoung commented Jul 26, 2017

jreback commented Aug 22, 2017

datapythonista commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018

jreback commented Dec 3, 2018

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018 • edited by MarcoGorelli Loading

h-vetinari commented Dec 3, 2018 • edited Loading

TomAugspurger commented Dec 3, 2018

jreback commented Dec 3, 2018

datapythonista commented Dec 3, 2018

MarcoGorelli commented Dec 2, 2022

jorisvandenbossche commented Feb 17, 2023

jondo commented Feb 18, 2023

jorisvandenbossche commented Dec 5, 2023

Rinfore commented Dec 6, 2023

jorisvandenbossche commented Dec 6, 2023

Rinfore commented Dec 6, 2023

kapoor1992 commented Mar 18, 2024

phofl commented Mar 19, 2024

gfyoung commented May 29, 2017 •

edited by jorisvandenbossche

Loading

datapythonista commented Dec 3, 2018 •

edited by MarcoGorelli

Loading

h-vetinari commented Dec 3, 2018 •

edited

Loading