Skip to content

get rid of some false negatives in rustdoc::broken_intra_doc_links #132748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

lolbinarycat
Copy link
Contributor

@lolbinarycat lolbinarycat commented Nov 7, 2024

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links ([text](url)) can actually contain a url, other types of links (reference links, shortcut links) contain a reference which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes #54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

is shortcut link not shortcut link
has backtick never ignore never ignore
no backtick ignore if url-like never ignore

@rustbot
Copy link
Collaborator

rustbot commented Nov 7, 2024

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Nov 7, 2024
@lolbinarycat lolbinarycat force-pushed the rustdoc-intra-doc-link-warn-more-54191 branch from df3d0f6 to 8ae6586 Compare November 7, 2024 21:41
@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@notriddle notriddle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good idea, but I have found one problem. Consider a sample like this:

//@ check-pass
#![no_std]
#![deny(rustdoc::broken_intra_doc_links)]

// regression test for https://github.com/rust-lang/rust/issues/54191
// false positives

//! This is not an intra-doc link: [`foobar`]
//!
//! [`foobar`]: /index.html

That test case fails, because it thinks /index.html is an intra-doc link. You're right that certain kinds of links aren't allowed to be URLs—that's the Unknown type links.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 7, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat
Copy link
Contributor Author

welp, turns out the standard library itself contains several of these false negatives!

@lolbinarycat
Copy link
Contributor Author

...or perhaps there's some deeper bug with intra-doc link generation?

there seems to be some false positives containing spaces, even though there is a matching reference definition...

I guess we can just re-enable unconditionally ignoring links with spaces?

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat
Copy link
Contributor Author

Ok, this breaks a lot of ui tests. This might actually have a pretty big impact, maybe we should do a crater run or something..?

@lolbinarycat lolbinarycat added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 8, 2024
@notriddle
Copy link
Contributor

It's looks fine to me. Here's the PRs where each of these test cases were added:

test case PR
disambiguator-endswith-named-suffix.rs #127016
weird-syntax.rs #112014
redundant_explicit_links-utf8.rs #115070

The last two are both contrived ICE tests, and all of them are intended to be parsed as intra-doc links. Making it so that they warn seems fine to me.

r? @GuillaumeGomez do you also think this is a suitable bug fix?

@rustbot rustbot assigned GuillaumeGomez and unassigned notriddle Nov 8, 2024
@notriddle notriddle self-assigned this Nov 8, 2024
@GuillaumeGomez
Copy link
Member

I think so. I'm not completely satisfied with the current impact though. Intra-doc links should not be triggered on items that contain characters which can't be in an ident or a path, like Clone\(\). Maybe a different warning in these cases?

@lolbinarycat
Copy link
Contributor Author

@GuillaumeGomez what about just amending the current warning to account for other common mistakes?

honestly makes me wish we had --explain for lints, putting a full help message for each one could get a bit verbose.

regardless, i think that should be a separate issue, since the lint output isn't the most helpful in general (it just recommends escaping the brackets, and doesn't mention other common issues, such as misspelled link references)

@GuillaumeGomez
Copy link
Member

what about just amending the current warning to account for other common mistakes?

Do you have an example of before/after of what you have in mind by any chance?

@lolbinarycat
Copy link
Contributor Author

Do you have an example of before/after of what you have in mind by any chance?

well, the most obvious one is some sort of "consider adding a reference: [whatever]: //some-url.example/", but i also think trying to catch typos by comparing the edit distance of the missing reference to that of existing references could be handy.

and perhaps we should also mention surrounding code by backticks, since code snippets should generally be in code blocks instead of individually escaping chars.

@GuillaumeGomez
Copy link
Member

well, the most obvious one is some sort of "consider adding a reference: [whatever]: //some-url.example/", but i also think trying to catch typos by comparing the edit distance of the missing reference to that of existing references could be handy.

Sounds good to me. The distance between two items could be nice too (as a follow-up?).

@GuillaumeGomez
Copy link
Member

Then let's go, thanks everyone!

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Jul 23, 2025

📌 Commit 2d38996 has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 23, 2025
fmease added a commit to fmease/rust that referenced this pull request Jul 23, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
fmease added a commit to fmease/rust that referenced this pull request Jul 23, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
fmease added a commit to fmease/rust that referenced this pull request Jul 24, 2025
…k-warn-more-54191, r=GuillaumeGomez

get rid of some false negatives in rustdoc::broken_intra_doc_links

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links (`[text](url)`) can actually contain a url, other types of links (reference links, shortcut links) contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes rust-lang#54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

|              |  is shortcut link  | not shortcut link |
|--------------|--------------------|-------------------|
| has backtick |    never ignore    |    never ignore   |
| no backtick  | ignore if url-like |    never ignore   |
bors added a commit that referenced this pull request Jul 24, 2025
Rollup of 15 pull requests

Successful merges:

 - #132748 (get rid of some false negatives in rustdoc::broken_intra_doc_links)
 - #143374 (Unquerify extern_mod_stmt_cnum.)
 - #143838 (std: net: uefi: Add support to query connection data)
 - #144014 (don't link to the nightly version of the Edition Guide in stable lints)
 - #144094 (Ensure we codegen the main fn)
 - #144218 (Use serde for target spec json deserialize)
 - #144221 (generate elf symbol version in raw-dylib)
 - #144240 (Add more test case to check if the false note related to sealed trait suppressed)
 - #144247 (coretests/num: use ldexp instead of hard-coding a power of 2)
 - #144276 (Use less HIR in check_private_in_public.)
 - #144317 (pass build.npm from bootstrap to tidy and use it for npm install)
 - #144320 (rustdoc: avoid allocating a temp String for aliases in search index)
 - #144334 (rustc_resolve: get rid of unused rustdoc::span_of_fragments_with_expansion)
 - #144335 (Don't suggest assoc ty bound on non-angle-bracketed problematic assoc ty binding)
 - #144358 (Stop using the old `validate_attr` logic for stability attributes)

r? `@ghost`
`@rustbot` modify labels: rollup
@fmease
Copy link
Member

fmease commented Jul 24, 2025

#144384 (comment)
@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 24, 2025
@fmease
Copy link
Member

fmease commented Jul 24, 2025

@bors try jobs=dist-x86_64-linux-alt

@rust-bors
Copy link

rust-bors bot commented Jul 24, 2025

⌛ Trying commit 2d38996 with merge 0419140

To cancel the try build, run the command @bors try cancel.

rust-bors bot added a commit that referenced this pull request Jul 24, 2025
…-54191, r=<try>

get rid of some false negatives in rustdoc::broken_intra_doc_links

try-job: dist-x86_64-linux-alt
@lolbinarycat
Copy link
Contributor Author

lolbinarycat commented Jul 24, 2025

weirdly the erroring code doesn't seem to exist in master or nightly source.

EDIT: nvm there's multiple comments referencing README.md in that file, i was looking at the wrong one.

@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Jul 24, 2025

💔 Test failed (CI). Failed jobs:

lolbinarycat and others added 7 commits July 24, 2025 11:17
rustdoc will not try to do intra-doc linking if the "path"
of a link looks too much like a "real url".

however, only inline links ([text](url)) can actually contain
a url, other types of links (reference links, shortcut links)
contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should
not be skipped due to looking like a url.

Co-authored-by: Michael Howell <michael@notriddle.com>
this is in an effort to reduce the amount of code churn caused by
this lint triggering on text that was never meant to be a link.

a more principled hierustic for ignoring lints is not possible
without extensive changes, due to the lint emitting code
being so far away from the link collecting code,
and the fact that only the link collecting code
has access to details about how the link appears in the
unnormalized markdown.
collapsed links and reference links have a pretty particular syntax,
it seems unlikely they would show up on accident.

Co-authored-by: León Orell Valerian Liehr <me@fmease.dev>
@lolbinarycat lolbinarycat force-pushed the rustdoc-intra-doc-link-warn-more-54191 branch from 2d38996 to a73d7e3 Compare July 24, 2025 17:38
@rustbot
Copy link
Collaborator

rustbot commented Jul 24, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustdoc does not warn about broken links if they contain . or []
8 participants