-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add DOC003 to check docstring style mismatch (#207)
- Loading branch information
Showing
13 changed files
with
413 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# More about docstring style mismatch (`DOC003`) | ||
|
||
This violation code warns you when _pydoclint_ thinks that the docstring | ||
is written in a different style than the style you specified via the | ||
`--style` config option. | ||
|
||
## 1. How does _pydoclint_ detect the style of a docstring? | ||
|
||
_pydoclint_ detects the style of a docstring with this very simple procedure: | ||
|
||
- It attempts to parse the docstring in all 3 styles: numpy, Google, and Sphinx | ||
- It them compares the "size" of the parsed docstring objects | ||
- The "size" is a human-made metric to measure how "fully parsed" a docstring | ||
object is. For example, a docstring object without the return section is | ||
larger in "size" than that with the return section (all others being equal) | ||
- The style that yields the largest "size" is considered the style of the | ||
docstring | ||
|
||
## 2. How accurate is this detection heuristic? | ||
|
||
The authors of _pydoclint_ have manually tested this heuristic in | ||
8 repositories written in all 3 styles (numpy, Google, and Sphinx), | ||
and have found this heuristic to be satisfactory: | ||
|
||
- Accuracy: 100% | ||
- Precision: 100% | ||
- Recall: 100% | ||
|
||
However, we admit that 8 is too small a sample size to be statistically | ||
representative. If you encounter any false positives or false negatives, | ||
please don't hesitate to file an | ||
issue [here](https://github.com/jsh9/pydoclint/issues). | ||
|
||
## 3. Can I turn this off? | ||
|
||
Actually, this style mismatch detection feature is by default _off_. | ||
|
||
You can turn this feature on by setting `--check-style-mismatch` (or `-csm`) to `True` | ||
(or `--check-style-mismatch=True`). | ||
|
||
## 3. Is it much slower to parse a docstring in all 3 styles? | ||
|
||
It is not. The authors of _pydoclint_ benchmarked some very large code bases, and | ||
here are the results (as of 2025/01/12): | ||
|
||
| | numpy | scikit-learn | Bokeh | Airflow | | ||
| ---------------------------- | ----- | ------------ | ----- | ------- | | ||
| Number of .py files | 581 | 929 | 1196 | 5004 | | ||
| Run time with 1 style [sec] | 1.84 | 2.68 | 0.77 | 5.50 | | ||
| Run time with 3 styles [sec] | 1.91 | 2.79 | 0.78 | 5.77 | | ||
| Additional run time [sec] | 0.07 | 0.11 | 0.01 | 0.07 | | ||
| Relative additional run time | 4% | 4% | 1% | 5% | | ||
|
||
## 4. What violation code is associated with style mismatch? | ||
|
||
`DOC003`: "Docstring style mismatch". | ||
|
||
## 5. How to fix this violation code? | ||
|
||
You are suggested to check if the docstring style is consistent with | ||
what you specified via the `--style` config option. If not, please | ||
rewrite your docstring, or specify the correct style via `--style`. | ||
|
||
Also, please note that specifying an incorrect docstring style may | ||
mask other violations. So after you fix the docstring style, you may | ||
need to fix other "new" (previously hidden) violations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
from __future__ import annotations | ||
|
||
from docstring_parser import ParseError | ||
|
||
from pydoclint.utils.doc import Doc | ||
|
||
|
||
def parseDocstring( | ||
docstring: str, | ||
userSpecifiedStyle: str, | ||
) -> tuple[Doc, ParseError | None, bool]: | ||
""" | ||
Parse docstring in all 3 docstring styles and return the one that | ||
is parsed with the most likely style. | ||
""" | ||
docNumpy, excNumpy = parseDocstringInGivenStyle(docstring, 'numpy') | ||
docGoogle, excGoogle = parseDocstringInGivenStyle(docstring, 'google') | ||
docSphinx, excSphinx = parseDocstringInGivenStyle(docstring, 'sphinx') | ||
|
||
docstrings: dict[str, Doc] = { | ||
'numpy': docNumpy, | ||
'google': docGoogle, | ||
'sphinx': docSphinx, | ||
} | ||
docstringSizes: dict[str, int] = { | ||
'numpy': docNumpy.docstringSize, | ||
'google': docGoogle.docstringSize, | ||
'sphinx': docSphinx.docstringSize, | ||
} | ||
parsingExceptions: dict[str, ParseError | None] = { | ||
'numpy': excNumpy, | ||
'google': excGoogle, | ||
'sphinx': excSphinx, | ||
} | ||
# Whichever style has the largest docstring size, we think that it is | ||
# the actual style that the docstring is written in. | ||
maxDocstringSize = max(docstringSizes.values()) | ||
styleMismatch: bool = docstringSizes[userSpecifiedStyle] < maxDocstringSize | ||
return ( | ||
docstrings[userSpecifiedStyle], | ||
parsingExceptions[userSpecifiedStyle], | ||
styleMismatch, | ||
) | ||
|
||
|
||
def parseDocstringInGivenStyle( | ||
docstring: str, | ||
style: str, | ||
) -> tuple[Doc, ParseError | None]: | ||
"""Parse the docstring and return the content of the doc.""" | ||
exception: ParseError | None = None | ||
try: | ||
doc: Doc = Doc(docstring=docstring, style=style) | ||
except ParseError as exc: | ||
doc = Doc(docstring='', style=style) | ||
exception = exc | ||
|
||
return doc, exception |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.