Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: fix SA05 errors in docstrings #57392

Closed
jordan-d-murphy opened this issue Feb 13, 2024 · 31 comments · Fixed by #57446
Closed

DOC: fix SA05 errors in docstrings #57392

jordan-d-murphy opened this issue Feb 13, 2024 · 31 comments · Fixed by #57446
Labels
CI Continuous Integration Docs good first issue

Comments

@jordan-d-murphy
Copy link
Contributor

Pandas has a script for validating docstrings:

pandas/ci/code_checks.sh

Lines 145 to 206 in 1d7aedc

MSG='Partially validate docstrings (SA05)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=SA05 --ignore_functions \
pandas.DataFrame.agg\
pandas.DataFrame.aggregate\
pandas.DataFrame.boxplot\
pandas.PeriodIndex.asfreq\
pandas.arrays.ArrowStringArray\
pandas.arrays.StringArray\
pandas.core.groupby.DataFrameGroupBy.first\
pandas.core.groupby.DataFrameGroupBy.last\
pandas.core.groupby.SeriesGroupBy.first\
pandas.core.groupby.SeriesGroupBy.last\
pandas.core.resample.Resampler.first\
pandas.core.resample.Resampler.last\
pandas.core.window.ewm.ExponentialMovingWindow.corr\
pandas.core.window.ewm.ExponentialMovingWindow.cov\
pandas.core.window.ewm.ExponentialMovingWindow.mean\
pandas.core.window.ewm.ExponentialMovingWindow.std\
pandas.core.window.ewm.ExponentialMovingWindow.sum\
pandas.core.window.ewm.ExponentialMovingWindow.var\
pandas.core.window.expanding.Expanding.aggregate\
pandas.core.window.expanding.Expanding.apply\
pandas.core.window.expanding.Expanding.corr\
pandas.core.window.expanding.Expanding.count\
pandas.core.window.expanding.Expanding.cov\
pandas.core.window.expanding.Expanding.kurt\
pandas.core.window.expanding.Expanding.max\
pandas.core.window.expanding.Expanding.mean\
pandas.core.window.expanding.Expanding.median\
pandas.core.window.expanding.Expanding.min\
pandas.core.window.expanding.Expanding.quantile\
pandas.core.window.expanding.Expanding.rank\
pandas.core.window.expanding.Expanding.sem\
pandas.core.window.expanding.Expanding.skew\
pandas.core.window.expanding.Expanding.std\
pandas.core.window.expanding.Expanding.sum\
pandas.core.window.expanding.Expanding.var\
pandas.core.window.rolling.Rolling.aggregate\
pandas.core.window.rolling.Rolling.apply\
pandas.core.window.rolling.Rolling.corr\
pandas.core.window.rolling.Rolling.count\
pandas.core.window.rolling.Rolling.cov\
pandas.core.window.rolling.Rolling.kurt\
pandas.core.window.rolling.Rolling.max\
pandas.core.window.rolling.Rolling.mean\
pandas.core.window.rolling.Rolling.median\
pandas.core.window.rolling.Rolling.min\
pandas.core.window.rolling.Rolling.quantile\
pandas.core.window.rolling.Rolling.rank\
pandas.core.window.rolling.Rolling.sem\
pandas.core.window.rolling.Rolling.skew\
pandas.core.window.rolling.Rolling.std\
pandas.core.window.rolling.Rolling.sum\
pandas.core.window.rolling.Rolling.var\
pandas.core.window.rolling.Window.mean\
pandas.core.window.rolling.Window.std\
pandas.core.window.rolling.Window.sum\
pandas.core.window.rolling.Window.var\
pandas.plotting.bootstrap_plot\
pandas.plotting.boxplot\
pandas.plotting.radviz # There should be no backslash in the final line, please keep this comment in the last ignored function
RET=$(($RET + $?)) ; echo $MSG "DONE"

Currently, some methods fail the SA05 check.

The task here is:

  • take 2-4 methods
  • run: scripts/validate_docstrings.py --format=actions --errors=SA05 method-name
  • check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported
  • remove those methods from code_checks.sh
  • commit, push, open pull request
  • Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work.

If you're new contributor, please check the contributing guide

@jordan-d-murphy
Copy link
Contributor Author

I don't have permission to add them, but this could probably use some labels:

  • CI
  • Docs
  • good first issue

@jordan-d-murphy
Copy link
Contributor Author

Addresses #57352

@jordan-d-murphy
Copy link
Contributor Author

I'll take

pandas.DataFrame.agg
pandas.DataFrame.aggregate
pandas.DataFrame.boxplot

@williamking1221
Copy link
Contributor

I'll take

     pandas.PeriodIndex.asfreq
     pandas.arrays.ArrowStringArray
     pandas.arrays.StringArray

@Pistonamey
Copy link
Contributor

I'll take
pandas.core.groupby.DataFrameGroupBy.first pandas.core.groupby.DataFrameGroupBy.last

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 14, 2024

I found a small fix that resolved all of the ExponentialMovingWindow, Expanding, Rolling, and Window SA05 errors (and pandas.plotting.boxplot).

This fix takes care of the following:

pandas.core.window.ewm.ExponentialMovingWindow.corr
pandas.core.window.ewm.ExponentialMovingWindow.cov
pandas.core.window.ewm.ExponentialMovingWindow.mean
pandas.core.window.ewm.ExponentialMovingWindow.std
pandas.core.window.ewm.ExponentialMovingWindow.sum
pandas.core.window.ewm.ExponentialMovingWindow.var

pandas.core.window.expanding.Expanding.apply
pandas.core.window.expanding.Expanding.corr
pandas.core.window.expanding.Expanding.count
pandas.core.window.expanding.Expanding.cov
pandas.core.window.expanding.Expanding.kurt
pandas.core.window.expanding.Expanding.max
pandas.core.window.expanding.Expanding.mean
pandas.core.window.expanding.Expanding.median
pandas.core.window.expanding.Expanding.min
pandas.core.window.expanding.Expanding.quantile
pandas.core.window.expanding.Expanding.rank
pandas.core.window.expanding.Expanding.sem
pandas.core.window.expanding.Expanding.skew
pandas.core.window.expanding.Expanding.std
pandas.core.window.expanding.Expanding.sum
pandas.core.window.expanding.Expanding.var

pandas.core.window.rolling.Rolling.apply
pandas.core.window.rolling.Rolling.corr
pandas.core.window.rolling.Rolling.count
pandas.core.window.rolling.Rolling.cov
pandas.core.window.rolling.Rolling.kurt
pandas.core.window.rolling.Rolling.max
pandas.core.window.rolling.Rolling.mean
pandas.core.window.rolling.Rolling.median
pandas.core.window.rolling.Rolling.min
pandas.core.window.rolling.Rolling.quantile
pandas.core.window.rolling.Rolling.rank
pandas.core.window.rolling.Rolling.sem
pandas.core.window.rolling.Rolling.skew
pandas.core.window.rolling.Rolling.std
pandas.core.window.rolling.Rolling.sum
pandas.core.window.rolling.Rolling.var

pandas.core.window.rolling.Window.mean
pandas.core.window.rolling.Window.std
pandas.core.window.rolling.Window.sum
pandas.core.window.rolling.Window.var

pandas.plotting.boxplot

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 14, 2024

I think this is the updated list of what's left:

        pandas.core.groupby.SeriesGroupBy.first
        pandas.core.groupby.SeriesGroupBy.last
        pandas.core.resample.Resampler.first
        pandas.core.resample.Resampler.last
        pandas.core.window.expanding.Expanding.aggregate
        pandas.core.window.rolling.Rolling.aggregate
        pandas.plotting.bootstrap_plot
        pandas.plotting.radviz

@Pistonamey
Copy link
Contributor

Hey @jordan-d-murphy

I can take SeriesGroupBy : first and last once my previous PR gets merged.

@jordan-d-murphy
Copy link
Contributor Author

Awesome! thank you 😊

@Pistonamey
Copy link
Contributor

Hey @jordan-d-murphy
Do we need to do something before our changes are merged even if all the checks pass?
Somebody reviewed my changes for my above PR but neither merged them nor rejected them. Am I missing something ?

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 14, 2024

@Pistonamey Looks like your PR is merged now, can you check again?

Sometimes there might by multiple reviewers on a PR and the first reviewer might add their approval review but leave it for another reviewer to merge.

Looks like that was the case here
image

Great work! Thank you for helping out 🙂

@Pistonamey
Copy link
Contributor

@jordan-d-murphy

Got it, Thanks.

@williamking1221
Copy link
Contributor

I'll take

        pandas.core.resample.Resampler.first
        pandas.core.resample.Resampler.last

@williamking1221
Copy link
Contributor

I'll take

        pandas.core.resample.Resampler.first
        pandas.core.resample.Resampler.last

Seems like both are ok, even without changes. I'll just remove these two then.

@williamking1221
Copy link
Contributor

williamking1221 commented Feb 14, 2024

Hi @jordan-d-murphy,

For

        pandas.core.window.expanding.Expanding.aggregate
        pandas.core.window.rolling.Rolling.aggregate

There is a ES01 -- No extended summary found error. If there a suggested fix to this?
Thanks!

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 14, 2024

For the ES01 I have opened a PR to get these addressed. See #57359

I have a merge conflict to resolve in order to get that merged, and then I'll open an Issue to fix all ES01 errors. (Similar to this issue)

Hopefully that will happen later today when I get home from work. I need to do that from my personal laptop.

But feel free to fix it if you want to! The ultimate goal of All these issues I'm opening is to have correct docstrings for all the functions. I'm segmenting them out to give some structure to the process, but a fully correct docstring for each function is the end goal!

(Typing this from my phone is it's hard to explain the fix for ES01, but when I get back to my personal laptop I'll share more info on those fixes. For PRs opened under this issue it's okay if other errors show up, as long as the SA05 errors are resolved.)

@ooooo-create
Copy link
Contributor

I'll take

        pandas.plotting.bootstrap_plot
        pandas.plotting.radviz

@jordan-d-murphy
Copy link
Contributor Author

@williamking1221 #57440 for ESO1s

@williamking1221
Copy link
Contributor

williamking1221 commented Feb 15, 2024

Thanks @jordan-d-murphy!

I will pull from main, which as the ignore ES01 checks, and then raise a PR for

        pandas.core.window.expanding.Expanding.aggregate
        pandas.core.window.rolling.Rolling.aggregate

Do you have a link to a doc on resolving ES01 issues?

@williamking1221
Copy link
Contributor

I believe if this PR goes through, this issue can be closed.

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 16, 2024

@williamking1221 Yeah here's some info on the ES01s from what I understand.

I'll show two examples to illustrate:
Running the following scripts/validate_docstrings.py --format=actions --errors=ES01 pandas.Categorical.map outputs:

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Categorical.map" correct. :)

and running scripts/validate_docstrings.py --format=actions --errors=ES01 pandas.Categorical.__array__ outputs

################################################################################
################################## Validation ##################################
################################################################################

3 Errors found for `pandas.Categorical.__array__`:
	ES01	No extended summary found
	PR01	Parameters {'dtype'} not documented
	SA01	See Also section not found

If we look at the docstrings, we can see a difference:

pandas.Categorical.array (fails with one ES01 - see the higlighted single line summary)
Screenshot 2024-02-15 at 5 54 42 PM

pandas.Categorical.map (passes with no ES01 - See highlighted Extended Summary)
Screenshot 2024-02-15 at 5 54 00 PM

Adding an extended summary fixes this:
Screenshot 2024-02-15 at 5 58 41 PM

updated output:

################################################################################
################################## Validation ##################################
################################################################################

2 Errors found for `pandas.Categorical.__array__`:
	PR01	Parameters {'dtype'} not documented
	SA01	See Also section not found

I dont' know if that answers the question, but hopefully that's a start.

EDIT: I just want to be clear this is an example of HOW to resolve the issue, but call out that we shouldn't just start adding meaningless extended summaries in order to pass the checks.

@jordan-d-murphy
Copy link
Contributor Author

@williamking1221 thanks for opening the issue to close this out, I added one comment as a suggestion to ensure the integrity of the SA05 checks

@williamking1221
Copy link
Contributor

Hi @jordan-d-murphy

Thanks for this! Btw, is your comment on the PR I raised?

@jordan-d-murphy
Copy link
Contributor Author

yeah if you open your pr it should show as a review comment

@williamking1221
Copy link
Contributor

I don't see it, could you please double check if it went through?

Thanks!

@jordan-d-murphy
Copy link
Contributor Author

I tagged you can you see it now?

@williamking1221
Copy link
Contributor

Fixed it. Please take a look if this is what we want it to be.
Thanks @jordan-d-murphy!

@jordan-d-murphy
Copy link
Contributor Author

jordan-d-murphy commented Feb 16, 2024

@williamking1221 What we were talking about was adding SA05 here

pandas/ci/code_checks.sh

Lines 68 to 70 in 43209e7

MSG='Validate docstrings (EX01, EX03, EX04, GL01, GL02, GL03, GL04, GL05, GL06, GL07, GL09, GL10, PR03, PR04, PR05, PR06, PR08, PR09, PR10, RT01, RT02, RT04, RT05, SA02, SA03, SA04, SS01, SS02, SS03, SS04, SS05, SS06)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX01,EX03,EX04,GL01,GL02,GL03,GL04,GL05,GL06,GL07,GL09,GL10,PR03,PR04,PR05,PR06,PR08,PR09,PR10,RT01,RT02,RT04,RT05,SA02,SA03,SA04,SS01,SS02,SS03,SS04,SS05,SS06
RET=$(($RET + $?)) ; echo $MSG "DONE"

when we remove the block here

pandas/ci/code_checks.sh

Lines 3185 to 3187 in 43209e7

MSG='Partially validate docstrings (SA05)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=SA05
RET=$(($RET + $?)) ; echo $MSG "DONE"

@williamking1221
Copy link
Contributor

@williamking1221 What we were talking about was adding SA05 here

pandas/ci/code_checks.sh

Lines 68 to 70 in 43209e7

MSG='Validate docstrings (EX01, EX03, EX04, GL01, GL02, GL03, GL04, GL05, GL06, GL07, GL09, GL10, PR03, PR04, PR05, PR06, PR08, PR09, PR10, RT01, RT02, RT04, RT05, SA02, SA03, SA04, SS01, SS02, SS03, SS04, SS05, SS06)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX01,EX03,EX04,GL01,GL02,GL03,GL04,GL05,GL06,GL07,GL09,GL10,PR03,PR04,PR05,PR06,PR08,PR09,PR10,RT01,RT02,RT04,RT05,SA02,SA03,SA04,SS01,SS02,SS03,SS04,SS05,SS06
RET=$(($RET + $?)) ; echo $MSG "DONE"

when we remove the block here

pandas/ci/code_checks.sh

Lines 3185 to 3187 in 43209e7

MSG='Partially validate docstrings (SA05)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=SA05
RET=$(($RET + $?)) ; echo $MSG "DONE"

Ah gotcha! Thanks! Fixed it in recent commit.

@jordan-d-murphy
Copy link
Contributor Author

Opened DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063 as a parent issue for fixing docstrings based on the refactoring in code_checks.sh

Feel free to swing by and help out! 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs good first issue
Projects
None yet
5 participants