Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Redefine IndexOpsMixin.size, fix #25580. #25584

Merged
merged 7 commits into from
Mar 11, 2019

Conversation

sighingnow
Copy link
Contributor

@sighingnow sighingnow commented Mar 7, 2019

Signed-off-by: HE, Tao <sighingnow@gmail.com>
Signed-off-by: HE, Tao <sighingnow@gmail.com>
@sighingnow sighingnow changed the title Add IntegerArray.size, fix #25580. BUG: Add IntegerArray.size, fix #25580. Mar 7, 2019
@codecov
Copy link

codecov bot commented Mar 7, 2019

Codecov Report

Merging #25584 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25584      +/-   ##
==========================================
+ Coverage   91.26%   91.26%   +<.01%     
==========================================
  Files         173      173              
  Lines       52966    52968       +2     
==========================================
+ Hits        48337    48340       +3     
+ Misses       4629     4628       -1
Flag Coverage Δ
#multiple 89.83% <100%> (ø) ⬆️
#single 41.7% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/base.py 97.76% <100%> (ø) ⬆️
pandas/plotting/_core.py 83.53% <0%> (-0.05%) ⬇️
pandas/core/config.py 87% <0%> (-0.05%) ⬇️
pandas/core/arrays/sparse.py 92.17% <0%> (ø) ⬆️
pandas/core/groupby/base.py 91.83% <0%> (ø) ⬆️
pandas/core/frame.py 96.79% <0%> (ø) ⬆️
pandas/core/generic.py 93.65% <0%> (ø) ⬆️
pandas/io/feather_format.py 89.74% <0%> (ø) ⬆️
pandas/core/groupby/groupby.py 97.2% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74a9ae3...8b6a6f4. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 7, 2019

Codecov Report

Merging #25584 into master will decrease coverage by 49.54%.
The diff coverage is 50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25584       +/-   ##
===========================================
- Coverage   91.26%   41.71%   -49.55%     
===========================================
  Files         173      173               
  Lines       52966    52968        +2     
===========================================
- Hits        48337    22094    -26243     
- Misses       4629    30874    +26245
Flag Coverage Δ
#multiple ?
#single 41.71% <50%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/integer.py 43.85% <50%> (-52.47%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.36%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 131 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74a9ae3...3f28f1e. Read the comment docs.

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Mar 7, 2019
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 7, 2019 via email

@sighingnow sighingnow changed the title BUG: Add IntegerArray.size, fix #25580. BUG: Add ExtensionArray.size, fix #25580. Mar 7, 2019
Signed-off-by: HE, Tao <sighingnow@gmail.com>
@TomAugspurger
Copy link
Contributor

@sighingnow can you avoid force pushing? Makes it a bit easier to review incrementally. It'd be good to include a release note for the bug this is fixing too.

@jorisvandenbossche do you remember if we discussed a size property on EA? It's not standard on other arrays in the ecosystem (none of array.array, memoryview, pyarrow define it). Maybe just re-define IndexOpsMixin.size to be (len(self._values),)?

@sighingnow
Copy link
Contributor Author

Noted, thanks @TomAugspurger.

The size method should return an integer, rather than the shape (len(self._values),).

@TomAugspurger
Copy link
Contributor

Whoops, yes it should return an integer (I was mixing it up with shape).

That said, in the name of keeping a smaller API footprint, I'm leaning towards redefining IndexOpsMixin.size to use len(self._values). We'll need to update MultiIndex.size for this change, since that's the only 2-D index.

I believe that this will work for MultiIndex (have to be careful about unused levels)

def size(self):
    return np.prod(self.remove_unused_levels().levshape)

@sighingnow
Copy link
Contributor Author

sighingnow commented Mar 8, 2019

I agree to move redefine size in IndexOpsMixin to avoid adding a new API, but I think the size for MultiIndex above is not correct.

>>> a = pd.MultiIndex.from_arrays(np.array([[1,2,3], [4,5,6], [7,8,9]]))
>>> a.size
3
>>> np.prod(a.remove_unused_levels().levshape)
27

I just want to confirm whether current behavior of size for MultiIndex is correct? (I'm a newcomer for pandas.) Thanks!

Edit: if redefine size in IndexOpsMixin as len(self._values), the behavior for MultiIndex is consistent with current implementation.

@TomAugspurger
Copy link
Contributor

Oh, hmm that's surprising. But I don't think we can easily change it at this point... I guess that means that MultiIndex.size can continue to use IndexOpsMixin.size.

Signed-off-by: HE, Tao <sighingnow@gmail.com>
@sighingnow sighingnow changed the title BUG: Add ExtensionArray.size, fix #25580. BUG: Redefine IndexOpsMixin.size, fix #25580. Mar 8, 2019
@jorisvandenbossche
Copy link
Member

+1 on for now not adding size to ExtensionArray. It is always easy to add it later if there are good use cases for it.

The size of a MultiIndex is consistent with its shape and ndim, ànd with how it is represented as an array:

In [22]: a = pd.MultiIndex.from_arrays(np.array([[1,2,3], [4,5,6], [7,8,9]]))                                                                                                   

In [23]: a.shape                                                                                                                                                                
Out[23]: (3,)

In [24]: a.ndim                                                                                                                                                                 
Out[24]: 1

In [25]: a.size                                                                                                                                                                 
Out[25]: 3

In [26]: np.asarray(a)                                                                                                                                                          
Out[26]: array([(1, 4, 7), (2, 5, 8), (3, 6, 9)], dtype=object)

so a MultiIndex is regarded as a 1D object, not 2D (for sure can be confusing :-))

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test for the actual bug in size? (because it was surfaced first through resample, but size itself is also public API which currently fails)

@@ -745,7 +745,7 @@ def nbytes(self):
"""
Return the number of bytes in the underlying data.
"""
return self._values.nbytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong property that is being changed? (nbytes, not size)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, will fix that soon.

@@ -31,6 +31,7 @@ Fixed Regressions
- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
- Fixed pip installing from source into an environment without NumPy (:issue:`25193`)
- Fixed bug where :meth:`core.base.IndexOpsMixin.size` could not return correct result for :class:`api.extensions.ExtensionArray` (:issue:`25580`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to mention public API, so instead of core.base.IndexOpsMixin.size, I would use Series.size or Index.size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jreback
Copy link
Contributor

jreback commented Mar 8, 2019

I think we should add .size to ea generally

this has come a number of times (and to be honest it’s a trivial addition to the base class )

Signed-off-by: HE, Tao <sighingnow@gmail.com>
@jreback jreback added this to the 0.24.2 milestone Mar 10, 2019
@jreback jreback added the Bug label Mar 10, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm. small whatsnew change; ping on green.

@@ -31,6 +31,7 @@ Fixed Regressions
- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
- Fixed pip installing from source into an environment without NumPy (:issue:`25193`)
- Fixed bug where :meth:`Index.size` could not return correct result for :class:`api.extensions.ExtensionArray` (:issue:`25580`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, it raised previously ot 'returned an incorrect value'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also a bug fix and not a regression, can you move to the appopriate section (bug fixes numeric is ok)

Copy link
Contributor

@jreback jreback Mar 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually move to resampling bug fixes is good (you could mention that in the whatsnew note)

@jreback
Copy link
Contributor

jreback commented Mar 11, 2019

lgtm. though something failing.

@TomAugspurger
Copy link
Contributor

It's the same sparse failure that should be fixed on master.

@jorisvandenbossche jorisvandenbossche merged commit 6375549 into pandas-dev:master Mar 11, 2019
@jorisvandenbossche
Copy link
Member

@sighingnow thanks!

@jorisvandenbossche
Copy link
Member

MeeseeksDev doesn't seem to be picking this one up (and I also assume there will be a conflict in the whatsnew), so will do a manual backport

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Mar 11, 2019
…5584)

Signed-off-by: HE, Tao <sighingnow@gmail.com>
(cherry picked from commit 6375549)
thoo added a commit to thoo/pandas that referenced this pull request Mar 11, 2019
* upstream/master: (110 commits)
  DOC: hardcode contributors for 0.24.x releases (pandas-dev#25662)
  DOC: restore toctree maxdepth (pandas-dev#25134)
  BUG: Redefine IndexOpsMixin.size, fix pandas-dev#25580. (pandas-dev#25584)
  BUG: to_csv line endings with compression (pandas-dev#25625)
  DOC: file obj for to_csv must be newline='' (pandas-dev#25624)
  Suppress incorrect warning in nargsort for timezone-aware DatetimeIndex (pandas-dev#25629)
  TST: fix incorrect sparse test (now failing on scipy master) (pandas-dev#25653)
  CLN: Removed debugging code (pandas-dev#25647)
  DOC: require Return section only if return is not None nor commentary (pandas-dev#25008)
  DOC:Remove hard-coded examples from _flex_doc_SERIES (pandas-dev#24589) (pandas-dev#25524)
  TST: xref pandas-dev#25630 (pandas-dev#25643)
  BUG: Fix pandas-dev#25481 by fixing the error message in TypeError (pandas-dev#25540)
  Fixturize tests/frame/test_mutate_columns.py (pandas-dev#25642)
  Fixturize tests/frame/test_join.py (pandas-dev#25639)
  Fixturize tests/frame/test_combine_concat.py (pandas-dev#25634)
  Fixturize tests/frame/test_asof.py (pandas-dev#25628)
  BUG: Fix user-facing AssertionError with to_html (pandas-dev#25608) (pandas-dev#25620)
  DOC: resolve all GL03 docstring validation errors (pandas-dev#25525)
  TST: failing wheel building on PY2 and old numpy (pandas-dev#25631)
  DOC: Remove makePanel from docs (pandas-dev#25609) (pandas-dev#25612)
  ...
jorisvandenbossche added a commit that referenced this pull request Mar 11, 2019
Signed-off-by: HE, Tao <sighingnow@gmail.com>
(cherry picked from commit 6375549)
@sighingnow
Copy link
Contributor Author

@jorisvandenbossche many thanks for helping me get this PR merged into master! Sorry for the late response.

BTW, I'm wondering if there were conflicts between pull request and master, how should I resolve that? Should I merge the master into the pull request, or rebase the pull request on top of master?

@sighingnow sighingnow deleted the fix-25580 branch March 12, 2019 04:38
@WillAyd
Copy link
Member

WillAyd commented Mar 12, 2019

@sighingnow you should merge master into the branch

@sighingnow
Copy link
Contributor Author

Noted, thanks! @WillAyd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Int64 series resampling
5 participants