Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-integrate Sphinx's linkcheck into CI configuration #12932

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

ladislav987
Copy link

Bring Back Link Checks for Documentation (#12474)

Description:
This pull request addresses the need to re-enable link checks as detailed in issue #12474. The following changes were implemented:

  1. Updated Sphinx Version
  • The latest Sphinx version was added in both requirements.txt and tox.ini to ensure compatibility and functionality with the new configurations.
  1. Enhanced Sphinx Configuration in conf.py
  • Adjusted linkcheck_workers from 5 to 20 to improve check efficiency.

  • Added parameters to enhance link check stability:

    • linkcheck_timeout = 30
    • linkcheck_retries = 2
    • linkcheck_anchors = False
    • linkcheck_rate_limit_timeout = 2.0
    • linkcheck_delay = 2.0
  • Added commented-out code (optional) to exclude specific directories to avoid check interruptions:

# Exclude documents that are known to cause issues
# linkcheck_exclude_documents = [
#     'old_versions/*',
#     'deprecated_features/*',
# ]
  • Expanded linkcheck_ignore to avoid unnecessary checks on frequently referenced and stable links:
r"https://github\.com/sponsors/.*",
r"https://pypi\.org/project/pytest.*",
r"https://github\.com/pytest-dev/pytest/issues/.*",

Without r"https://github\.com/sponsors/.*", I get lot "Too Many Requests for url". It seems to be related to missing credentials or tokens for GitHub.

  1. Updated Tox Commands
  • Modified the Sphinx build command in tox.ini:

    • Changed sphinx-build -W -q --keep-going -b linkcheck . _build to sphinx-build -W -q --keep-going -b linkcheck -j auto . _build for using sphinx parallel build option to speed up the process.
  • Added commented-out (optional) code to facilitate output logging if needed:

; /bin/sh -c "sphinx-build -W -q --keep-going -b linkcheck -j auto . _build > linkcheck_output.txt 2>&1"
;allowlist_externals =
;    /bin/sh

@Pierre-Sassoulas Pierre-Sassoulas added the skip news used on prs to opt out of the changelog requirement label Nov 1, 2024
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR ! I have some suggestions, I think the dead code comments should be removed before merging too. Also what makes the sphinx upgrade mandatory ?

doc/en/conf.py Outdated
Comment on lines 146 to 148
r"https://github\.com/sponsors/.*",
r"https://pypi\.org/project/pytest.*",
r"https://github\.com/pytest-dev/pytest/issues/.*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in theory the dot should be escaped in a regex, as it's not a dot, it's "any character", but it works and is simpler imo. (like in the two links above). Or we can make the two links above consistent :)

Suggested change
r"https://github\.com/sponsors/.*",
r"https://pypi\.org/project/pytest.*",
r"https://github\.com/pytest-dev/pytest/issues/.*",
r"https://github.com/sponsors/.*",
r"https://pypi.org/project/pytest.*",

I think we should modify the existing regex line 144 /145 so they accept link to a comment, or remove them entirely in favor of the simpler one you added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified it to:

linkcheck_ignore = [
    "https://blogs.msdn.microsoft.com/bharry/2017/06/28/testing-in-a-cloud-delivery-cadence/",
    "http://pythontesting.net/framework/pytest-introduction/",
    r"https://pypi\.org/project/pytest.*",
    r"https://github\.com/sponsors/.*",
    r"https://github\.com/pytest-dev/pytest/issues/.*",
    r"https://github\.com/pytest-dev/pytest/pull/.*",
]

tox.ini Outdated Show resolved Hide resolved
@ladislav987
Copy link
Author

Thank you for the PR ! I have some suggestions, I think the dead code comments should be removed before merging too. Also what makes the sphinx upgrade mandatory ?

I deleted commented line in code.

The update of Sphinx from version 7 to version 8.1.3 was due to better configuration and performance Improvements in the new version.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Thank you for the PR, the changes look good.

I would expect some links to have to be fixed, isn't there any to fix ? (There was a list of broken link in the original issue.) Also reading the issue I understood that it should be something periodical and automated like pre-commit or dependabot but I'm not sure who this is addressed here.

AUTHORS Outdated Show resolved Hide resolved
r"https://github.com/pytest-dev/pytest/issues/\d+",
r"https://github.com/pytest-dev/pytest/pull/\d+",
r"https://pypi\.org/project/pytest.*",
r"https://github\.com/sponsors/.*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? I don't remember such URLs not working in other projects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    r"https://github.com/pytest-dev/pytest/issues/\d+",
    r"https://github.com/pytest-dev/pytest/pull/\d+",

this is from the original version. But you're right, this doesn't trigger any issues with link.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without r"https://github\.com/sponsors/.*", checking will trigger issues:
WARNING: broken link: https://github.com/sponsors/xxx (429 Client Error: Too Many Requests for url: https://github.com/xxx)

It seems to be related to missing credentials or tokens for GitHub but I'm not 100% sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With xxx specifically, I suppose? How about with real existing usernames?

@@ -141,10 +141,18 @@
linkcheck_ignore = [
"https://blogs.msdn.microsoft.com/bharry/2017/06/28/testing-in-a-cloud-delivery-cadence/",
"http://pythontesting.net/framework/pytest-introduction/",
r"https://github.com/pytest-dev/pytest/issues/\d+",
r"https://github.com/pytest-dev/pytest/pull/\d+",
r"https://pypi\.org/project/pytest.*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches any project page starting with pytest. Like pytesty. Why? I don't think I've seen any problems with this host in linkcheck in the past.

r"https://github.com/pytest-dev/pytest/pull/\d+",
r"https://pypi\.org/project/pytest.*",
r"https://github\.com/sponsors/.*",
r"https://github\.com/pytest-dev/pytest/issues/.*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be more accurate. Such a wildcard would also silence obviously broken URLs like https://github.com/pytest-dev/pytest/pull/zombie that should produce errors otherwise.

@@ -2,7 +2,8 @@
pluggy>=1.5.0
pygments-pytest>=2.3.0
sphinx-removed-in>=0.2.0
sphinx>=7
sphinx>=8.1.3
sphinx_issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sphinx-issues is already listed below

Suggested change
sphinx_issues

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

tox.ini Outdated
Comment on lines 129 to 130
deps =
-r{toxinidir}/doc/en/requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you keep such formatting changes out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

@@ -143,7 +144,7 @@ passenv =
deps =
PyYAML
regendoc>=0.8.1
sphinx
sphinx>=8.1.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should instead just reference the requirements file or the base section value...

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking time to look into the issue. It looks like this PR mass-ignores a lot of URLs that should instead be validated.

Instead, we should make every ignore regex very precise so we don't suppress too many things and hide problems in doing so.

Additionally, I'd like to ask you to add code comments justifying every single ignore entry you're adding or changing. Here's an example: https://github.com/cherrypy/cheroot/blob/a9c1fe4/docs/conf.py#L124-L125. So far, it looks like some of those are not justified and shouldn't be there, but it's hard to be sure because the explanation is mysteriously missing.

@webknjaz
Copy link
Member

Also reading the issue I understood that it should be something periodical and automated like pre-commit or dependabot but I'm not sure who this is addressed here.

This can't be run in pre-commit or dependabot. Instead, it should be integrated into GHA.

@ladislav987 it's necessary to add a separate GHA workflow with a job calling tox -e docs-checklinks, following the structure of other GHA workflows. But it'd need to have an additional trigger for scheduled runs, not just push and pull_request.

@webknjaz webknjaz changed the title Chvastas/bring back link checks Re-integrate Sphinx's linkcheck into CI configuration Nov 11, 2024
Comment on lines +150 to +155
linkcheck_workers = 20
linkcheck_timeout = 30
linkcheck_retries = 2
linkcheck_anchors = False
linkcheck_rate_limit_timeout = 2.0
linkcheck_delay = 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the defaults of these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news used on prs to opt out of the changelog requirement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants