Skip to content

Commit

Permalink
Add DOC003 to check docstring style mismatch (#207)
Browse files Browse the repository at this point in the history
  • Loading branch information
jsh9 authored Jan 13, 2025
1 parent 94039b2 commit 2e8af22
Show file tree
Hide file tree
Showing 13 changed files with 413 additions and 23 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Change Log

## [0.6.0] - 2025-01-13

- Added
- A new violation code, `DOC003`, to detect docstring style mismatch (when
docstrings are written in the style different from specified)
- Full diff
- https://github.com/jsh9/pydoclint/compare/0.5.19...0.6.0

## [0.5.19] - 2025-01-12

- Fixed
Expand Down
27 changes: 17 additions & 10 deletions docs/config_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,12 @@ page:
- [16. `--treat-property-methods-as-class-attributes` (shortform: `-tpmaca`, default: `False`)](#16---treat-property-methods-as-class-attributes-shortform--tpmaca-default-false)
- [17. `--only-attrs-with-ClassVar-are-treated-as-class-attrs` (shortform: `-oawcv`, default: `False)](#17---only-attrs-with-classvar-are-treated-as-class-attrs-shortform--oawcv-default-false)
- [18. `--should-document-star-arguments` (shortform: `-sdsa`, default: `True`)](#18---should-document-star-arguments-shortform--sdsa-default-true)
- [19. `--baseline`](#19---baseline)
- [20. `--generate-baseline` (default: `False`)](#20---generate-baseline-default-false)
- [21. `--auto-regenerate-baseline` (shortform: `-arb`, default: `True`)](#21---auto-regenerate-baseline-shortform--arb-default-true)
- [22. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)](#22---show-filenames-in-every-violation-message-shortform--sfn-default-false)
- [23. `--config` (default: `pyproject.toml`)](#23---config-default-pyprojecttoml)
- [19. `--check-style-mismatch` (shortform: `-csm`, default: `True`)](#19---check-style-mismatch-shortform--csm-default-true)
- [20. `--baseline`](#20---baseline)
- [21. `--generate-baseline` (default: `False`)](#21---generate-baseline-default-false)
- [22. `--auto-regenerate-baseline` (shortform: `-arb`, default: `True`)](#22---auto-regenerate-baseline-shortform--arb-default-true)
- [23. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)](#23---show-filenames-in-every-violation-message-shortform--sfn-default-false)
- [24. `--config` (default: `pyproject.toml`)](#24---config-default-pyprojecttoml)

<!--TOC-->

Expand Down Expand Up @@ -217,7 +218,13 @@ If True, "star arguments" (such as `*args`, `**kwargs`, `**props`, etc.)
in the function signature should be documented in the docstring. If False,
they should not appear in the docstring.

## 19. `--baseline`
## 19. `--check-style-mismatch` (shortform: `-csm`, default: `True`)

If True, check that style specified in --style matches the detected
style of the docstring. If there is a mismatch, DOC003 will be
reported. Setting this to False will silence all DOC003 violations.

## 20. `--baseline`

Baseline allows you to remember the current project state and then show only
new violations, ignoring old ones. This can be very useful when you'd like to
Expand All @@ -239,20 +246,20 @@ If `--generate-baseline` is not passed to _pydoclint_ (the default
is `False`), _pydoclint_ will read your baseline file, and ignore all
violations specified in that file.

## 20. `--generate-baseline` (default: `False`)
## 21. `--generate-baseline` (default: `False`)

Required to use with `--baseline` option. If `True`, generate the baseline file
that contains all current violations.

## 21. `--auto-regenerate-baseline` (shortform: `-arb`, default: `True`)
## 22. `--auto-regenerate-baseline` (shortform: `-arb`, default: `True`)

If it's set to True, _pydoclint_ will automatically regenerate the baseline
file every time you fix violations in the baseline and rerun _pydoclint_.

This saves you from having to manually regenerate the baseline file by setting
`--generate-baseline=True` and run _pydoclint_.

## 22. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)
## 23. `--show-filenames-in-every-violation-message` (shortform: `-sfn`, default: `False`)

If False, in the terminal the violation messages are grouped by file names:

Expand Down Expand Up @@ -286,7 +293,7 @@ This can be convenient if you would like to click on each violation message and
go to the corresponding line in your IDE. (Note: not all terminal app offers
this functionality.)

## 23. `--config` (default: `pyproject.toml`)
## 24. `--config` (default: `pyproject.toml`)

The full path of the .toml config file that contains the config options. Note
that the command line options take precedence over the .toml file. Look at this
Expand Down
66 changes: 66 additions & 0 deletions docs/style_mismatch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# More about docstring style mismatch (`DOC003`)

This violation code warns you when _pydoclint_ thinks that the docstring
is written in a different style than the style you specified via the
`--style` config option.

## 1. How does _pydoclint_ detect the style of a docstring?

_pydoclint_ detects the style of a docstring with this very simple procedure:

- It attempts to parse the docstring in all 3 styles: numpy, Google, and Sphinx
- It them compares the "size" of the parsed docstring objects
- The "size" is a human-made metric to measure how "fully parsed" a docstring
object is. For example, a docstring object without the return section is
larger in "size" than that with the return section (all others being equal)
- The style that yields the largest "size" is considered the style of the
docstring

## 2. How accurate is this detection heuristic?

The authors of _pydoclint_ have manually tested this heuristic in
8 repositories written in all 3 styles (numpy, Google, and Sphinx),
and have found this heuristic to be satisfactory:

- Accuracy: 100%
- Precision: 100%
- Recall: 100%

However, we admit that 8 is too small a sample size to be statistically
representative. If you encounter any false positives or false negatives,
please don't hesitate to file an
issue [here](https://github.com/jsh9/pydoclint/issues).

## 3. Can I turn this off?

Actually, this style mismatch detection feature is by default _off_.

You can turn this feature on by setting `--check-style-mismatch` (or `-csm`) to `True`
(or `--check-style-mismatch=True`).

## 3. Is it much slower to parse a docstring in all 3 styles?

It is not. The authors of _pydoclint_ benchmarked some very large code bases, and
here are the results (as of 2025/01/12):

| | numpy | scikit-learn | Bokeh | Airflow |
| ---------------------------- | ----- | ------------ | ----- | ------- |
| Number of .py files | 581 | 929 | 1196 | 5004 |
| Run time with 1 style [sec] | 1.84 | 2.68 | 0.77 | 5.50 |
| Run time with 3 styles [sec] | 1.91 | 2.79 | 0.78 | 5.77 |
| Additional run time [sec] | 0.07 | 0.11 | 0.01 | 0.07 |
| Relative additional run time | 4% | 4% | 1% | 5% |

## 4. What violation code is associated with style mismatch?

`DOC003`: "Docstring style mismatch".

## 5. How to fix this violation code?

You are suggested to check if the docstring style is consistent with
what you specified via the `--style` config option. If not, please
rewrite your docstring, or specify the correct style via `--style`.

Also, please note that specifying an incorrect docstring style may
mask other violations. So after you fix the docstring style, you may
need to fix other "new" (previously hidden) violations.
9 changes: 5 additions & 4 deletions docs/violation_codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,11 @@

## 0. `DOC0xx`: Docstring parsing issues

| Code | Explanation |
| -------- | ---------------------------------------- |
| `DOC001` | Potential formatting errors in docstring |
| `DOC002` | Syntax error in the Python file |
| Code | Explanation |
| -------- | ---------------------------------------------------------------------------------------------- |
| `DOC001` | Potential formatting errors in docstring |
| `DOC002` | Syntax error in the Python file |
| `DOC003` | Docstring style mismatch ([explanation](https://jsh9.github.io/pydoclint/style_mismatch.html)) |

## 1. `DOC1xx`: Violations about input arguments

Expand Down
18 changes: 18 additions & 0 deletions pydoclint/flake8_entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,18 @@ def add_options(cls, parser: Any) -> None: # noqa: D102
' appear in the docstring.'
),
)
parser.add_option(
'-csm',
'--check-style-mismatch',
action='store',
default='False',
parse_from_config=True,
help=(
'If True, check that style specified in --style matches the detected'
' style of the docstring. If there is a mismatch, DOC003 will be'
' reported. Setting this to False will silence all DOC003 violations.'
),
)

@classmethod
def parse_options(cls, options: Any) -> None: # noqa: D102
Expand Down Expand Up @@ -261,6 +273,7 @@ def parse_options(cls, options: Any) -> None: # noqa: D102
cls.should_document_star_arguments = (
options.should_document_star_arguments
)
cls.check_style_mismatch = options.check_style_mismatch
cls.style = options.style

def run(self) -> Generator[tuple[int, int, str, Any], None, None]:
Expand Down Expand Up @@ -342,6 +355,10 @@ def run(self) -> Generator[tuple[int, int, str, Any], None, None]:
'--should-document-star-arguments',
self.should_document_star_arguments,
)
checkStyleMismatch = self._bool(
'--check-style-mismatch',
self.check_style_mismatch,
)

if self.style not in {'numpy', 'google', 'sphinx'}:
raise ValueError(
Expand Down Expand Up @@ -372,6 +389,7 @@ def run(self) -> Generator[tuple[int, int, str, Any], None, None]:
treatPropertyMethodsAsClassAttributes
),
shouldDocumentStarArguments=shouldDocumentStarArguments,
checkStyleMismatch=checkStyleMismatch,
style=self.style,
)
v.visit(self._tree)
Expand Down
18 changes: 18 additions & 0 deletions pydoclint/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,18 @@ def validateStyleValue(
' appear in the docstring.'
),
)
@click.option(
'-csm',
'--check-style-mismatch',
type=bool,
show_default=True,
default=False,
help=(
'If True, check that style specified in --style matches the detected'
' style of the docstring. If there is a mismatch, DOC003 will be'
' reported. Setting this to False will silence all DOC003 violations.'
),
)
@click.option(
'--baseline',
type=click.Path(
Expand Down Expand Up @@ -365,6 +377,7 @@ def main( # noqa: C901
require_yield_section_when_yielding_nothing: bool,
only_attrs_with_classvar_are_treated_as_class_attrs: bool,
should_document_star_arguments: bool,
check_style_mismatch: bool,
generate_baseline: bool,
auto_regenerate_baseline: bool,
baseline: str,
Expand Down Expand Up @@ -465,6 +478,7 @@ def main( # noqa: C901
require_yield_section_when_yielding_nothing
),
shouldDocumentStarArguments=should_document_star_arguments,
checkStyleMismatch=check_style_mismatch,
)

if generate_baseline:
Expand Down Expand Up @@ -601,6 +615,7 @@ def _checkPaths(
requireReturnSectionWhenReturningNothing: bool = False,
requireYieldSectionWhenYieldingNothing: bool = False,
shouldDocumentStarArguments: bool = True,
checkStyleMismatch: bool = False,
quiet: bool = False,
exclude: str = '',
) -> dict[str, list[Violation]]:
Expand Down Expand Up @@ -661,6 +676,7 @@ def _checkPaths(
requireYieldSectionWhenYieldingNothing
),
shouldDocumentStarArguments=shouldDocumentStarArguments,
checkStyleMismatch=checkStyleMismatch,
)
allViolations[filename.as_posix()] = violationsInThisFile

Expand All @@ -686,6 +702,7 @@ def _checkFile(
requireReturnSectionWhenReturningNothing: bool = False,
requireYieldSectionWhenYieldingNothing: bool = False,
shouldDocumentStarArguments: bool = True,
checkStyleMismatch: bool = False,
) -> list[Violation]:
if not filename.is_file(): # sometimes folder names can end with `.py`
return []
Expand Down Expand Up @@ -741,6 +758,7 @@ def _checkFile(
requireYieldSectionWhenYieldingNothing
),
shouldDocumentStarArguments=shouldDocumentStarArguments,
checkStyleMismatch=checkStyleMismatch,
)
visitor.visit(tree)
return visitor.violations
Expand Down
2 changes: 2 additions & 0 deletions pydoclint/utils/doc.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ def __init__(self, docstring: str, style: str = 'numpy') -> None:
else:
self._raiseException()

self.docstringSize = self.parsed.size

def __repr__(self) -> str:
return pprint.pformat(self.__dict__, indent=2)

Expand Down
58 changes: 58 additions & 0 deletions pydoclint/utils/parse_docstring.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from __future__ import annotations

from docstring_parser import ParseError

from pydoclint.utils.doc import Doc


def parseDocstring(
docstring: str,
userSpecifiedStyle: str,
) -> tuple[Doc, ParseError | None, bool]:
"""
Parse docstring in all 3 docstring styles and return the one that
is parsed with the most likely style.
"""
docNumpy, excNumpy = parseDocstringInGivenStyle(docstring, 'numpy')
docGoogle, excGoogle = parseDocstringInGivenStyle(docstring, 'google')
docSphinx, excSphinx = parseDocstringInGivenStyle(docstring, 'sphinx')

docstrings: dict[str, Doc] = {
'numpy': docNumpy,
'google': docGoogle,
'sphinx': docSphinx,
}
docstringSizes: dict[str, int] = {
'numpy': docNumpy.docstringSize,
'google': docGoogle.docstringSize,
'sphinx': docSphinx.docstringSize,
}
parsingExceptions: dict[str, ParseError | None] = {
'numpy': excNumpy,
'google': excGoogle,
'sphinx': excSphinx,
}
# Whichever style has the largest docstring size, we think that it is
# the actual style that the docstring is written in.
maxDocstringSize = max(docstringSizes.values())
styleMismatch: bool = docstringSizes[userSpecifiedStyle] < maxDocstringSize
return (
docstrings[userSpecifiedStyle],
parsingExceptions[userSpecifiedStyle],
styleMismatch,
)


def parseDocstringInGivenStyle(
docstring: str,
style: str,
) -> tuple[Doc, ParseError | None]:
"""Parse the docstring and return the content of the doc."""
exception: ParseError | None = None
try:
doc: Doc = Doc(docstring=docstring, style=style)
except ParseError as exc:
doc = Doc(docstring='', style=style)
exception = exc

return doc, exception
4 changes: 4 additions & 0 deletions pydoclint/utils/violation.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@
VIOLATION_CODES = types.MappingProxyType({
1: 'Potential formatting errors in docstring. Error message:',
2: 'Syntax errors; cannot parse this Python file. Error message:',
3: ( # noqa: PAR001
'Docstring style mismatch. (Please read more at'
' https://jsh9.github.io/pydoclint/style_mismatch.html ).'
),

101: 'Docstring contains fewer arguments than in function signature.',
102: 'Docstring contains more arguments than in function signature.',
Expand Down
Loading

0 comments on commit 2e8af22

Please sign in to comment.