Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mean over axis=1 in not-completely-numeric frame returns all nans #3689

Closed
cpcloud opened this issue May 23, 2013 · 15 comments
Closed

mean over axis=1 in not-completely-numeric frame returns all nans #3689

cpcloud opened this issue May 23, 2013 · 15 comments
Labels
Bug Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.

Comments

@cpcloud
Copy link
Member

cpcloud commented May 23, 2013

DataFrame from #3688. This might be related to that:

df = pd.DataFrame({'bar': {0: 1, 1: 1, 2: 1}, 'foo': {0: 0, 1: 1, 2: 2}, 'foo1': {0: 1, 1: 2, 2: 3}, 'hello': {0: 'a', 1: 'a', 2: 'a'}}, columns=['bar', 'foo', 'foo', 'hello'])
print df.T.sum(1) == df.sum()  # fine. + is str cat
print df.T.mean(1).isnull().all()  # prints True

I think non numeric should be dropped...

@jreback
Copy link
Contributor

jreback commented May 23, 2013

linking to #3679

@jreback
Copy link
Contributor

jreback commented May 23, 2013

I think this is actually a confusing API, mean should work (on the non-numeric), though they are object dtype, so not entirely sure

sum is defined for numeric and non-numeric (strings).
however meanis not

In [27]: df.T.sum(1)
Out[27]: 
bar        3
foo        3
foo        3
hello    aaa
dtype: object
In [28]: df.T.mean(1)
Out[28]: 
bar     NaN
foo     NaN
foo     NaN
hello   NaN
dtype: float64

maybe could see it raising....?

@cpcloud
Copy link
Member Author

cpcloud commented May 23, 2013

i kind of don't like the summing strings behavior, but that's just me. here's why: just because the binary operator associated with the reduction (or fold if you are into fp) is defined doesn't mean that the reduction itself makes any sense (e.g., lambda x, y: 2 * x + 3 * y, with string input), and in fact the built-in sum function is not defined for strings. it's probably too late to change that. i think not raising here is a bit magical, but i think that (df.T.mean(1) == df.mean()).all() should be True in as many cases as possible. I can't think of any right now where you wouldn't want that to be true.

@jreback
Copy link
Contributor

jreback commented Jun 4, 2013

moving to 0.12; I tried to fix this but a big non-trivial

@cpcloud
Copy link
Member Author

cpcloud commented Jun 4, 2013

Yeah I checked out ur dupe fix branch and tried to fix it. Didn't say anything cuz you weren't finished when I tried that, sorry. Next time I will speak up so you don't waste ur time!

@jreback
Copy link
Contributor

jreback commented Jun 4, 2013

no...nothing really to do with the dups.....this is embeded in groupby....(dupe fixes are in master btw)

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 9, 2014
@cpcloud cpcloud removed the Groupby label Jun 3, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

@jreback is this worth fixing? this doesn't work fundamentally because of the way pandas objects are represented (as Blocks). i just stepped thru the code for this and my intuition tells me that all sorts of transposing acrobatics would be necessary to make this work, for an extreme edge case that can be very easily worked around by simply transpose -> convert_objects -> mean. i vote close.

@cpcloud cpcloud removed the Groupby label Jun 3, 2014
@hayd
Copy link
Contributor

hayd commented Jun 3, 2014

The annoying/inconsistent part here is that you can take the mean of a object dtype Series...

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

It's because the objects here can't really be converted. Because the blocks are column wise getting the numeric data doesn't return anything

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

actually i think this is easy.
in _reduce (can do for series / frame)

  • just check for object dtypes on reductions (need to check because don't want to call convert_objects everytime which recreates everything (even with no copy)
  • if have object dtypes, try to convert_objects()
  • _get_numeric_data() (if numeric_only=True).

by definition it will raise if its still mixed type at this point, otherwise perform the op (you might need to do slightly differently for sum as that can handle object)

would need a perf check too

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2018

Problem still exists on master

@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 16, 2019
@jbrockmendel jbrockmendel added Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. labels Sep 22, 2020
@jbrockmendel
Copy link
Member

I think non numeric should be dropped...

df.T is all object dtype, so you would be dropping everything. Is that what you have in mind?

On the other hand, df.mean(axis=0) works as we'd expect.

@jbrockmendel jbrockmendel added Closing Candidate May be closeable, needs more eyeballs and removed Closing Candidate May be closeable, needs more eyeballs labels Sep 26, 2020
@mroeschke mroeschke added Bug and removed API Design labels Apr 11, 2021
@jbrockmendel jbrockmendel removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Dec 28, 2021
@DriesSchaumont
Copy link
Member

This now provides a warning, will raise in the future: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction. xref #28900

@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

do we have sufficient tests on the warning?

@DriesSchaumont
Copy link
Member

do we have sufficient tests on the warning?

Tests for the warnings in dataframe reductions were added in #41480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

No branches or pull requests

7 participants