Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

feat: linting pipeline to improve documentation quality #215

Merged

Conversation

dreh23
Copy link

@dreh23 dreh23 commented Jun 9, 2020

This Makefile provides a target for linting markdown coded
documentation files with commonmark. It utilizes a npm
package. There are several targets like "check" which includes
subtasks: linting and dead link checking.

This PR is more a proposal for a proper documentation workflow.
We could add spellchecking and other checks and a CI context
to enforce good quality. This can also be done with github
actions of course.

Signed-off-by: Johannes Amorosa johannes.amorosa@endocode.com

@dreh23 dreh23 changed the title fix/feat: Demonstartion of markdown linting fix/feat: Demonstration of markdown linting Jun 9, 2020
@dreh23
Copy link
Author

dreh23 commented Jun 9, 2020

If there is a general interest in this I can redo this PR to be more tidy.

@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch from cf75073 to 2df3b5b Compare June 9, 2020 14:25
@tkowark
Copy link
Member

tkowark commented Jun 10, 2020

Dear @dreh23 , thank you very much for this demonstration. We very much like the idea of being able to:

  • Do linting
  • Check for broken links
  • Perform spell checking

in the markdown files. Would be amazing, if you could redo the PR to cover these use cases. Also, let us know what needs to be done from our side. CircleCI is being used in most of the projects, but Github Actions would work for us, as well.

Thanks again!
TK

@jschleus
Copy link

Just a small supplement:

As described in the issue cwa-app-ios#149 there exists on fossies.org for all CWA sub-projects (repositories) a regularly updated misspelling report using the tool codespell.

So e.g. (see #182) for the "cwa-documentation" project you may look at the URL
  https://fossies.org/linux/test/cwa-documentation-master.tar.gz/codespell.html

By the way, some nice approaches and suggestions regarding the use of codespell via Github Actions you may find in the codespell issue #1499 and the corresponding actions-codespell repository.

@dreh23
Copy link
Author

dreh23 commented Jun 15, 2020

@jschleus I have a working npm setup for spellchecking utilizing mdspell. I did not include this here because I needed to research for multi language support (which should be easy).
Id rather go for a javascript option - which causes less dependencies since the rest is also JS. Although we should use containers I believe an offline testing should behave exactly the same as in the ci and less dependencies is more. I have a new PR coming in the next days.

@jschleus
Copy link

@dreh23 Sounds good. A appropriate test object would be the new markdown file backend-infrastructure-architecture.md with currently three obvious misspellings ("useanother" -> "use another", "currect" -> "current" and "documen t" -> "document").

@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch 11 times, most recently from 207fe9e to e5ad315 Compare June 17, 2020 20:57
@dreh23
Copy link
Author

dreh23 commented Jun 17, 2020

I rebased and force-pushed some commits for testing the CI. The PR now can:

  • spellcheck en-us or en-uk (Please give me feedback)
  • check for broken links
  • linting of markdown

There are some questions I will raise as comments inline. I would kindly ask for review, so I can fix outstanding issues. Once we are satisfied I can squash and tidy up commit message and/or branch name.

Please see the CI in action:

https://github.com/dreh23/cwa-documentation/runs/782141632?check_suite_focus=true

PS:

I included the spellchecker and this works great for single language files. Multilanguage is not supported right now, but this can be solved with different Makefile targets. This has some implications we would need to address:

  1. I believe we should clean up the translations into several folders to something like this:
translations
├── german
│   ├── cwa-risk-assessment.de.md
│   ├── LANGUAGE_STYLE.de.md
│   ├── pruefsteine.de.md
│   ├── README.de.md
│   └── scoping_document.de.md

The corresponding makefile targets are then:

  make spellcheck-en #check everything but not translations
  make spellcheck-de #check just translations --> german
  1. Files that contain both languages are not that easy to parse and will need to be fixed. Although lonely words like "Gesundheitsamt" should be part of the English dictionary and can be added in the override file (See .spelling and the INSTALL.md file for explanation)

  2. While we at it, can we have a guideline on file naming convention?

package.json Show resolved Hide resolved
package.json Show resolved Hide resolved
INSTALL.md Show resolved Hide resolved
@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch 2 times, most recently from ec1c220 to 2d86801 Compare June 17, 2020 21:38
@dreh23
Copy link
Author

dreh23 commented Jun 29, 2020

I rebased, added a trailing spaces detector which unfortunately does not work on github actions (but offline) and other small improvements.

make detect-whitespaces

@dreh23 dreh23 changed the title fix/feat: Demonstration of markdown linting feat: linting pipeline to improve documentation quality Jun 29, 2020
@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch 2 times, most recently from 50e0690 to 31eb503 Compare June 29, 2020 19:09
@dreh23
Copy link
Author

dreh23 commented Jun 29, 2020

I implemented #351 and bumped all npm packages. Additionally smaller corrections on language.

@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch from 31eb503 to aee40e0 Compare June 29, 2020 19:15
@tkowark
Copy link
Member

tkowark commented Jun 30, 2020

I've tried the makefile today, but to no avail. I guess this might have something to do with zsh being my SHELL on mac. As a suggestion, why not use the package.json to consolidate the scripts:

{
  "name": "docs",
  "version": "1.0.0",
  "description": "A central repository for documentation",
  "main": "README.md",
  "dependencies": {
    "alex": "^8.1.1",
    "markdown-link-check": "^3.8.1",
    "markdown-spellcheck": "^1.3.1",
    "markdownlint": "^0.20.4",
    "markdownlint-cli": "^0.23.1",
    "npm-run-all": "^4.1.5"
  },
  "devDependencies": {},
  "scripts": {
    "test": "run-s markdownlint checklinks spellcheck format-spelling detect-inconsiderate-language",
    "spellcheck": "mdspell '**/*.md' --en-us -t -n -a --report '!**/node_modules/**/*.md' '!**/.github/**/*.md' '!**/translations/**/*.md'",
    "markdownlint": "markdownlint '**/*.md' --ignore node_modules",
    "checklinks": "find . -not -path \"*node_modules*\" -not -path \"*.github*\" -name \"*.md\" | xargs -n 1 markdown-link-check",
    "detect-inconsiderate-language": "alex",
    "format-spelling": "sort < .spelling | sort | uniq | tee .spelling.tmp > /dev/null && mv .spelling.tmp .spelling",
    "detect-whitespaces": "find . -not -path \"*node_modules*\" -not -path \"*.github*\" -name \"*.md\" | xargs -n 1 grep -c -H -n -i -o '[[:blank:]]$$' | tee /dev/tty | cut -d ':' -f2- | paste -sd+ | bc | grep -P 0",
    "clean": "rm -rf node_modules"
  },
  "repository": {
    "type": "git",
    "url": "git@github.com:corona-warn-app/cwa-documentation.git"
  },
  "keywords": [
    "docs"
  ],
  "author": "Johannes Amorosa",
  "license": "Apache 2.0"
}

Unfortunately, the detect-whitespace did not work. -P seems to be an invalid option for grep.

@dreh23
Copy link
Author

dreh23 commented Jul 1, 2020

Sure why not. However I am quite unfamiliar with npm test, how could I run a single test, so we have atomic return codes for the CI?.
Additionally If you give me the error message on the Makefile I can try to debug it, I am using zsh as well so it is probably a non-posix grep option I could maybe replace with something that works on OSX.

@tkowark
Copy link
Member

tkowark commented Jul 1, 2020

In the ci, we could also run the commands sequentially, e.g., then npm run-script spellcheck, npm run-script checklinks, etc.

For grep I just get:

usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
	[-e pattern] [-f file] [--binary-files=value] [--color=when]
	[--context[=num]] [--directories=action] [--label] [--line-buffered]
	[--null] [pattern] [file ...]

@dreh23
Copy link
Author

dreh23 commented Jul 1, 2020

Could you try again removing the flag "-P"

@tkowark
Copy link
Member

tkowark commented Jul 1, 2020

Now I'm running into issues with paste:

> find . -not -path "*node_modules*" -not -path "*.github*" -name "*.md" | xargs -n 1 grep -c -H -n -i -o '[[:blank:]]$$' | tee /dev/tty | cut -d ':' -f2- | paste -sd+ | bc | grep 0

usage: paste [-s] [-d delimiters] file ...

@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch from aee40e0 to 0029197 Compare July 1, 2020 11:03
@dreh23
Copy link
Author

dreh23 commented Jul 1, 2020

I will remove the white space detection for now. We can have a follow up PR. I would rather like to get this merged, soon.

@tkowark
Copy link
Member

tkowark commented Jul 1, 2020

Yes, makes sense. And thanks again for all your work on this!

This Makefile provides several targets for linting
documents. It utilizes several npm packages.
Functionality includes:

* Spellcheck
* Linting of markdown
* Check for broken links
* Sorting of dictionary file
* Detect inconsidered language

All targets (excluding the sorting of the dict file) are part
of the Github Action pipeline and will fail if quality
standards are not met.

Signed-off-by: Johannes Amorosa <johannes.amorosa@endocode.com>
@dreh23 dreh23 force-pushed the fix/improve-markdown-formatting branch from 0029197 to ff34c16 Compare July 1, 2020 11:19
@dreh23
Copy link
Author

dreh23 commented Jul 1, 2020

Please review again. Latest CI run: https://github.com/dreh23/cwa-documentation/runs/826304712

@dreh23
Copy link
Author

dreh23 commented Jul 1, 2020

Created a follow-up ticket #355

@jschleus
Copy link

jschleus commented Jul 2, 2020

By the way, although in the meantime nearly superfluous there exists now on the Fossies server a regularly updated small page showing a summary of the (very few) still found codespell errors in the diverse CWA projects: https://fossies.org/linux/test/cwa_codespell_status.html

@dreh23 If there is still an interest on the "detect-whitespaces" (maybe better named "detect-whitespaces-at-line-end" or similar) I want to made some small remarks to the above mentioned code line that makes some problems.

Although I'm not very familiar with Makefiles and not familiar with JSON, I make only some general comments regarding the used basic command line.

At a first glance I assume that the a little bit tricky command line probably should show all not excluded markdown files together with their number of lines that contain a space or a tab at the line end (achieved by "tee /dev/tty") and after that a line containing just the sum of all found such lines (onto the standard output).

  1. The option "-P" (interpret the pattern as a Perl-compatible regular expression) seems
    meaningless since the pattern is not a regular expression (just "0") so that option could generally be omitted.

  2. It is unclear to me what the purpose of that "grep" command should be. It will only show the number of lines if the number is "0" or contain a "0" (like for e.g. 10 or 102). Maybe it should suppress that sum line if no lines with a space or a tab at the line end was found but than a commend like
    grep -v '^0$'
    should be used.

  3. To avoid the occured "paste" problem on some systems a separation of the options may be helpful:
    paste -s -d '+'

  4. The doubled $$ within the pattern of the first grep seems not correct (already fixed in the follow-up ticket As a linter target a trailing whitespace detector would be useful #355).

  5. The options "-n -i -o" for the first grep seem superfluous.

Additionally, if the command find finds no matching file the grep command outputs something like
(standard input):0
If using GNU xargs the option "--no-run-if-empty" (shortly "-r") helps to avoid calling grep in that case (for other xargs versions maybe "-I {}" could helpful, but untested).

And more enerally, if the file names may contain newlines or other
types of white space also the use of the find "-print0" option may be meaningful.

So an improved basic version may be (nearly untested)

find . -not -path "*node_modules*" -not -path "*.github*" -name "*.md" \
  | xargs -n 1 grep -c -H  '[[:blank:]]$' \
  | tee /dev/tty | cut -d ':' -f2-  | paste -s -d '+' | bc

These are just a few hopefully correct comments, but I may also be completely wrong.

@dreh23
Copy link
Author

dreh23 commented Jul 3, 2020

@jschleus Thanks for your input!

The fossies page is rather nice, one downside I see its an external service.

The tee command doesn't work in the CI and the whole line is anyway ugly (Perl is calling from the last century) - we may have to see if there is a more elegant approach (with npm). Some of the complexity of the command was about the return value that has to be right (because of the CI). But I agree it had several flaws, because it was done hastily from my side.

However the trailing space issue is now excluded from this PR. I will test your command line foo and then we can move the discussion to #355 .

If there is no more objections/bug/design flaws I would kindly ask for a merge and we can pick up missing things from there. I believe this PR will drastically improve from the situation before.

@tkowark
Copy link
Member

tkowark commented Jul 3, 2020

Hi @dreh23 , yes, I will merge this soon, ideally later today or during the weekend, depending on other tasks coming in (still have to go through the solution architecture changes....)

But just as a heads up, I will then do some changes afterwards:

  • Remove the makefile in favour of a npm run-script only solution (just one place to change the comments)
  • Adapt the action and documentation accordingly

As you've already put so much effort in the PR (again, THANK YOU!!), I would just do these minor changes by myself, so you don't have to touch things again.

@dreh23
Copy link
Author

dreh23 commented Jul 3, 2020

Great! You're welcome. If you need me testing, please add me as a reviewer.

@tkowark tkowark merged commit 02caed1 into corona-warn-app:master Jul 5, 2020
@tkowark tkowark mentioned this pull request Jul 5, 2020
@tkowark
Copy link
Member

tkowark commented Jul 5, 2020

Finally found the time this weekend to merge this PR. Thanks again for all your great work! I will take some time this week to go through the spelling and linting errors. But for starters, I already fixed a broken link to the Apple documentation 😄

@dreh23 dreh23 deleted the fix/improve-markdown-formatting branch July 6, 2020 14:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants