Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that links work before deploying #538

Open
penelopeysm opened this issue Oct 13, 2024 · 10 comments
Open

Check that links work before deploying #538

penelopeysm opened this issue Oct 13, 2024 · 10 comments

Comments

@penelopeysm
Copy link
Member

Since our internal links now all use meta variables, I'm not sure if there's any software out there that can check link validity at the source code stage. (We could probably write one, but that feels a bit excessive)

So instead we should probably run some sort of HTML link checker on the Quarto output itself, perhaps https://github.com/filiph/linkcheck

@shravanngoswamii
Copy link
Member

Even if we run a link check, our links will return 404 pages. Should these be considered broken links, or is it acceptable since they return some HTML content? And it would be better to run a link check directly on our Quarto markdown pages, so we don’t have to wait for the full site to render just to get link check approval.

@penelopeysm
Copy link
Member Author

I totally agree about running it on the Quarto docs being preferable, but I'm just not sure about it from a practical point of view -- do we have something that is capable of checking relative links between documents? I do think we could put something together to do that, but I'm not sure it's really worth the time.

I don't immediately see how we would get 404s! Could you explain?

@beingPro007
Copy link
Collaborator

@penelopeysm #530 (comment) I raised this discussion earlier on #530 maybe this is possible if we think of it. I agree this may take an extra overhead but it may solve the issue

@penelopeysm
Copy link
Member Author

@beingPro007, feel free to open a PR. I'll mention quickly that the workflow you suggested in
#530 (comment) probably needs to be tweaked:

  1. If you want to run this on a fresh workflow, you need to check out the gh-pages branch as that is where the docs are located.
  2. It would be much better to add the link checking into the existing preview and publish workflows, rather than to make a new workflow.

@shravanngoswamii
Copy link
Member

do we have something that is capable of checking relative links between documents? I do think we could put something together to do that, but I'm not sure it's really worth the time.

I am also not sure of this!

I don't immediately see how we would get 404s! Could you explain?

I do not know how link checker's work, I am just guessing that they search the link using something and if they do not get any HTML content from it then the particular link is considered broken, so in our case if a broken link is search then it will return html content of 404 error page like this one: https://turinglang.org/broken

@shravanngoswamii
Copy link
Member

IMO, it will be much better to make a reusable workflow that checks links in a html, md, mdx, other markup language's...!

@penelopeysm
Copy link
Member Author

penelopeysm commented Oct 23, 2024

if they do not get any HTML content from it then the particular link is considered broken

It's true that you will still get HTML contents, but the response code will still be a 404 and that should track as a failure:

$ curl -I https://turinglang.org/nope
HTTP/2 404
server: GitHub.com
content-type: text/html; charset=utf-8
access-control-allow-origin: *
etag: "66fd83a7-7010"
x-proxy-cache: MISS
x-github-request-id: D3B0:34BBDA:3A59292:3B14866:6718E69B
accept-ranges: bytes
date: Wed, 23 Oct 2024 12:06:46 GMT
via: 1.1 varnish
age: 58
x-served-by: cache-lhr-egll1980035-LHR
x-cache: HIT
x-cache-hits: 1
x-timer: S1729685206.999750,VS0,VE5
vary: Accept-Encoding
x-fastly-request-id: 7ab19db7f227bfceaf2d9e270894f6c9c79a8192
content-length: 28688

@shravanngoswamii
Copy link
Member

I wasn't aware of it, thanks for clarification!

@shravanngoswamii
Copy link
Member

I’ve been experimenting with some link checkers, and while it’s straightforward to run them on our generated HTML, it might not be the best approach for us as developers.

I found tcort/markdown-link-check, this works well for Quarto documents and correctly checks relative links. However, I'm encountering issues with our meta and var shortcodes used in links. @penelopeysm, do you have any suggestions for how we can address this?

  • One option is to write a small extension/filter for Quarto to handle these meta and var shortcodes, but it might not be worth the effort since these shortcodes aren’t primarily intended for links.
  • Alternatively, we could create a composite action using tcort/markdown-link-check and manually handle the meta and var shortcodes with a shell script or another scripting language that works easily across Windows, Mac, and Linux so users can also run our link checker locally!

@penelopeysm
Copy link
Member Author

I think that's pretty much what I meant when I said I wasn't sure whether it was worth our time.

  1. Personally, I think the best reward / effort ratio is to just run the link checker on the generated HTML. It means that it is difficult to check it locally (you have to render the pages first) but at least you can see the output from CI and correct things as needed after that.

  2. One other option is to just skip checking the meta and var shortcodes, which will make it easy to do locally, but you'll never know when one of those is broken, which I'm not comfortable with.

  3. Or as you say, we could write something ourselves. You're welcome to try either of those approaches (I think a custom shell / Julia script would be the best!) but that's not something I'd personally sink time into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants