Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: mad (mean absolute difference) functions #11787

Closed
aechase opened this issue Dec 7, 2015 · 12 comments · Fixed by #46707
Closed

DEPR: mad (mean absolute difference) functions #11787

aechase opened this issue Dec 7, 2015 · 12 comments · Fixed by #46707
Labels
Deprecate Functionality to remove in pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@aechase
Copy link
Contributor

aechase commented Dec 7, 2015

The generic function .mad() calculates the mean absolute difference of a set of data, but in some cases the median absolute difference is more appropriate. In R, the mad() function accepts a center argument to specify how the average absolute difference should be calculated. I propose to add the same to the pandas function.

@jreback
Copy link
Contributor

jreback commented Dec 7, 2015

we would do this as how='mean' as the default (and accept say median).

@jreback jreback added Difficulty Novice API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 7, 2015
@jreback jreback added this to the Next Major Release milestone Dec 7, 2015
@shoyer
Copy link
Member

shoyer commented Dec 7, 2015

df.mad() is equivalent to abs(df - df.mean()).mean(), which is much more explicit and obvious. Likewise, abs(df - df.median()).mean() would probably be better than df.mad(how='median') (note that from a performance perspective these are equivalent). I'm not entirely sure we really even need a separate .mad() method, though at this point I suppose there's little to gain from deprecating it.

@StephenKappel
Copy link
Contributor

In R, the mad() function can never be used to calculate the mad() as currently defined in Pandas. The outer aggregator is always the median. Thecenter argument is a numeric constant and is used like this: abs(df - center).median().

Having an option to use the median in Pandas seems like a good idea; robustness is a common reason to use the MAD rather than standard deviation, but the current implementation is certainly not robust.

I think how should control the outer aggregator and adding a separate center argument similar to R's could be convenient and add clarity.

df.mad(how='median', center='median') would evaluate as abs(df - df.median()).median()

df.mad(how='median', center='mean') would evaluate as abs(df - df.mean()).median()

df.mad(how='median', center=15.4) would evaluate as abs(df - 15.4).median()

df.mad(how='mean', center='median') would evaluate as abs(df - df.median()).mean()

etc....

If this seems like a reasonable end-state, I can work on this issue.

@shoyer
Copy link
Member

shoyer commented Jan 22, 2016

We could add a center keyword argument, but I would probably add documentation examples advising abs(x - x.median()).median() before adding both new keywords.

@BastiaanBergman
Copy link

  • R implements MAD as:

constant * Median(abs(x - center))

where the constant is by default ~1.48 which makes it comparable to the standard deviation, the center is provided by the user (could be median or anything). R manual

  • SAS implements MAD as:

median absolute deviation from the median

SAS manual

  • JMP has a median absolute deviation, but doesn't abbreviate it. It has no mean absolute deviation.

Although there exists such a thing, there is not much use for a mean absolute deviation. Your data is either normally distributed and you use standard deviation and mean or it is not normally distributed and you cannot use this but might use median absolute deviation and median.

@shoyer
Copy link
Member

shoyer commented Mar 30, 2016

Given all the potential confusion, I am almost inclined to deprecate this method instead. There are lots of ways to implement MAD, and it's almost better to force the user to implement their own rather than use the built-in one with mistaken assumptions.

Changing the implementation of methods like this is generally a non-starter because of a backwards compatibility concerns.

@andrewsanchez
Copy link

@shoyer @TomAugspurger I'm interested trying to implement this but wanted to check in first before getting to work, as the discussion was inconclusive. Does the recently added good first issue label mean this feature would be welcome? I think the improvement in code readability would be awesome.

@shoyer
Copy link
Member

shoyer commented Nov 8, 2017

I'm not convinced that it make sense to iterate on this method. Given that we provide the primitives to implement this as a one-liner, I would recommend users write their own helper function for mad() and apply it with .pipe() if necessary.

@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

i would be ok with deprecating .mad() entirely

@GGordonGordon
Copy link
Contributor

GGordonGordon commented Mar 5, 2018

@jreback - I was thinking of taking this ticket up assuming that the plan is to deprecate .mad()? If that's the case then should this ticket title be changed from ENH to CLN (or DEPR - is this still being used?)

Secondly - where should the deprecation test reside? I was thinking of putting it into the Generic class under pandas/tests/generic/test_generic.py?

@WillAyd WillAyd mentioned this issue Mar 7, 2018
4 tasks
@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

you don’t have to change the title

the test should go near where the mad tests are now

@tinksume
Copy link

I am working on this as part of Walmart PandasHack

@mroeschke mroeschke changed the title ENH: calculate average absolute difference by mean, median or mode DEPR: mad (mean absolute difference) functions Apr 21, 2021
@mroeschke mroeschke added Deprecate Functionality to remove in pandas and removed API Design good first issue labels Apr 21, 2021
@rhshadrach rhshadrach mentioned this issue Apr 8, 2022
4 tasks
@jreback jreback removed this from the Contributions Welcome milestone Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.