Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement TimedeltaArray/TimedeltaIIndex sum, median, std #28165

Merged
merged 9 commits into from
Oct 8, 2019

Conversation

jbrockmendel
Copy link
Member

The signatures match those in PandasArray.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a release note, and a couple test questions. Implementation looks good.

@pytest.mark.parametrize("skipna", [True, False])
def test_reductions_empty(self, name, skipna):
tdi = pd.TimedeltaIndex([])
arr = tdi._data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_data -> .array?

pandas/tests/arrays/test_timedeltas.py Show resolved Hide resolved
@WillAyd WillAyd added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Datetime Datetime data dtype labels Aug 27, 2019
@WillAyd WillAyd added this to the 1.0 milestone Aug 27, 2019
@@ -384,6 +386,49 @@ def astype(self, dtype, copy=True):
return self
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)

def sum(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotations would be nice for new developments, especially where reasonably easy to add

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like mypy starts complaining about index.py if i add types here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's it saying?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ mypy pandas/core/arrays/timedeltas.py
pandas/core/indexes/base.py:257: error: Missing return statement

pandas/tests/arrays/test_timedeltas.py Show resolved Hide resolved
@WillAyd
Copy link
Member

WillAyd commented Aug 28, 2019 via email

@jbrockmendel
Copy link
Member Author

If that’s it then just could just add return None at the end of that func

Looks like that func is Index.__new__ and the complaint is about the branch that calls cls._scalar_data_error(data), which raises

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change TimedeltaIndex behavior at all? e.g. I think that these were implemented before? or were they not correct before?

assert isinstance(result, pd.Timedelta)
assert result == expected

result = arr.std(skipna=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you leave these here on purpose (as this tests median)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a typo, will fix

@jreback
Copy link
Contributor

jreback commented Sep 2, 2019

ahh I see you are closing an issue (xref my last comment), can you add a whatsnew note describing the change.

@TomAugspurger
Copy link
Contributor

Do any of these (median) need docstrings? I believe that mean will be inherited.

May need to add these to the Attributes section of the TimedeltaArray docstring, and add to the list in the API docs.

@jreback
Copy link
Contributor

jreback commented Oct 6, 2019

this got lost but I think was ok, can you rebase and we'll look again.

@jbrockmendel
Copy link
Member Author

rebased

@jreback jreback merged commit 3fa47de into pandas-dev:master Oct 8, 2019
@jreback
Copy link
Contributor

jreback commented Oct 8, 2019

thanks @jbrockmendel

I guess this could have a whatsnew note, can you in future / other PR.

@jbrockmendel jbrockmendel deleted the reds branch October 8, 2019 14:26
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the behaviour of numerical dtypes for empty arrays? (-> sum giving 0 instead of nan)

assert isinstance(result, pd.Timedelta)
assert result == expected

result = tdi.sum(min_count=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The min_count was mainly added for the empty case, and for this it does not seem to work:

In [27]: tdi = pd.TimedeltaIndex([])                                                                                                                                                                               

In [28]: tdi.sum()                                                                                                                                                                                                 
Out[28]: NaT

In [29]: tdi.sum(min_count=1)                                                                                                                                                                                      
Out[29]: NaT

In [30]: tdi.sum(min_count=0)                                                                                                                                                                                      
Out[30]: NaT

If we follow the same behaviour as for numerical sum, the result should be 0 instead of NaT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.sum(TimedeltaIndex) should return a Timedelta, not np.timedelta64
5 participants