Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Updated kurt docstring (for pandas sprint) #19999

Merged
merged 6 commits into from
Mar 7, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 47 additions & 3 deletions pandas/core/window.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,7 +899,53 @@ def skew(self, **kwargs):
return self._apply('roll_skew', 'skew',
check_minp=_require_min_periods(3), **kwargs)

_shared_docs['kurt'] = """Unbiased %(name)s kurtosis"""
_shared_docs['kurt'] = dedent("""
Calculate unbiased %(name)s kurtosis.

This function uses Fisher's definition of kurtosis without bias.

Parameters
----------
kwargs : Under Review

Returns
-------
Series or DataFrame (matches input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the "matches input" is confusing. What is the input? I would think of input what I passed to the function, but that is not the case here.
Personally I would just leave it out. Or we should find a better wording (maybe "matches calling object", although I think that is also not very clear). Or explain it in more words in the explanation on the line below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of "Returned object type is determined by the caller of the %(name)s calculation"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that sounds good

Like-indexed object containing the result of function application

See Also
--------
pandas.Series.%(name)s
pandas.DataFrame.%(name)s
pandas.Series.kurtosis
pandas.DataFrame.kurtosis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the one with %(name)s substituted not the same as the two lines above? (below the substituted ones)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The substituted name in this case is either rolling or expanding - does that answer your question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that clarifies.
Another point, to avoid that the See also section becomes long, we could also put the related series/dataframe versions on a single line (like Series.kurtosis, DataFrame.kurtosis)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you are worried about it becoming to long is that purely within the source or on the web as well? I believe that the HTML output styles this to display it all inline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't know the html output put them on a single line anyhow. Not ideal IMO, but that's numpydoc behaviour.

scipy.stats.skew
scipy.stats.kurtosis

Notes
-----
A minimum of 4 periods is required for the rolling calculation.

Examples
--------
The below example will show a rolling calculation with a window size of
four matching the equivalent function call using `scipy.stats`.

>>> arr = [1, 2, 3, 4, 999]
>>> import scipy.stats
>>> print("{0:.6f}".format(scipy.stats.kurtosis(arr[:-1], bias=False)))
-1.200000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to satisfy the doctests?
If so, we should try to find another solution, as I think this makes it more complex to read (compared to simply scipy.stats.kurtosis(arr[:-1], bias=False))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let me mess around with iPython - I think we can use the %precision directive here behind the scenes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the annoying thing here is that scipy prints 1.99999999999, as otherwise 1.200... would have worked (using an ellipsis to match anything on that line)

Yeah, not sure there is a better solution in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was hoping that setting np.print_options would help for SciPy but no luck. I kind of wonder if globally it wouldn't be preferable to just set this to 6 across all docstrings for consistency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on setting it globally

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into it and I think that would unfortunately be pretty difficult for scalar values. np.set_printoptions may be worthwhile for arrays, but as far as this specific example is concerned the formatting of scalar values from what I can tell is left to interpreters like iPython. I looked at DocTests' documentation but didn't see anything there that would easily allow us to enforce that option during its testing

>>> print("{0:.6f}".format(scipy.stats.kurtosis(arr[1:], bias=False)))
3.999946
>>> df = pd.DataFrame(arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a Series is more natural? (as you don't have a column name)

>>> df.rolling(4).kurt()
0
0 NaN
1 NaN
2 NaN
3 -1.200000
4 3.999946
""")

def kurt(self, **kwargs):
return self._apply('roll_kurt', 'kurt',
Expand Down Expand Up @@ -1221,7 +1267,6 @@ def skew(self, **kwargs):
return super(Rolling, self).skew(**kwargs)

@Substitution(name='rolling')
@Appender(_doc_template)
@Appender(_shared_docs['kurt'])
def kurt(self, **kwargs):
return super(Rolling, self).kurt(**kwargs)
Expand Down Expand Up @@ -1461,7 +1506,6 @@ def skew(self, **kwargs):
return super(Expanding, self).skew(**kwargs)

@Substitution(name='expanding')
@Appender(_doc_template)
@Appender(_shared_docs['kurt'])
def kurt(self, **kwargs):
return super(Expanding, self).kurt(**kwargs)
Expand Down