-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: support “standard” footnotes #544
Conversation
Starting with the existing footnotes test suite, write tests for an implementation which matches the behavior of footnotes from other major implementors, including e.g. markdown-it and GitHub-Flavored Markdown. These "standard" footnotes: - are block items which behave like lists, and may have arbitrary blocks nested within them - include back-reference links, using the now-standard `↩︎` character, which include `aria-label="back to content"` and `role="doc-backlink"` to support better screen-reader feedback - appear in an ordered list at the end of the document, rather than inline in the flow of the text, which list is preceded by an `hr` Additionally, these footnote links and back-link are disambiguated from other possible in-document links (e.g. heading links) by incorporating `fn-` at the start of the link name. Finally, I fixed the descriptive text of the existing footnotes' spec for the multiple-paragraphs case, because it does not in fact support multiple paragraphs!
This can and should be bikeshedded!
e966033
to
6b4305c
Compare
- add an options flag to the bitflags - treat `enable-footnotes` and `enable-standard-footnotes` as mutually exclusive, including an error message and the usage message if they are both passed - to keep things clear, don't shadow the name `opts` in `main`: distinguish between the CLI options and the program options
f46d675
to
7a9c4f5
Compare
An observation here – @marcusklaas @raphlinus I’d be curious to get your thoughts on this: The current implementation of The body of the footnote is not a problem: as long as the updated parsing I'm working on correctly generates events within the block, the writer will emit them and then emit the footnote definition closing tag appropriately. It's only an issue for the open and close of the definition tag. (As an aside, and related to #517, it's actually very helpful to have the name of the footnote in the closing tag for the more-standard footnotes design, as it makes it easy to generate the back-reference link.) For now, to keep making forward progress, I'm going to simply update the main invocation to take some kind of configuration here, but it's definitely a thing we'd need to resolve before merging (once I get it working of course). Edit: actually, I'm supplying new |
As a bonus, fix the clippy errors and warnings in build.rs: - use `write_all` instead of `write` - use `unwrap_or_else` instead of `expect` (avoids allocating the string, and while we don't really care that much in the build script, doing less is always nice)
6f7147c
to
c4a1fae
Compare
Hmm. It occurs to me as I'm a good chunk of the way through implementing the output that in fact the key is to just avoid emitting those events! That is, instead of emitting
Where the return Event::Html(r##" <a href="#fn-<name>-ref" class="footnote-backref" aria-label="back to content" role="doc-backlink">↩︎</a>"##) The only remaining issue is making the anchors serve as back-link targets as well. I'm continuing forward with the configuration approach for now, but will reevaluate as I make progress on the parser side. |
Ultimately, it may be possible to work around this, but at present I'm not clear on that, especially without *other* breaking changes. This preserves the existing public API and simply adds handling for this scenario. One option, which *would* be a breaking change but would also address a long-standing request (pulldown-cmark#142) is to simply add support for back-links in *all* footnote types. In that case, there would once again be no need for configuration. However, that may need to be an *option*, since it's otherwise a breaking change for all existing users (including any who may be adding those manually!).
914a364
to
bf9b2e3
Compare
- For now, reuse the existing implementation from `firstpass`, which makes debugging to see the flow of the code viable. It's also a more-or-less reasonable starting point for the new behavior, which should have the same basic rules for starting (`parse_footnote`) but then will need a more sophisticated notion of when to terminate the body, in line with the design of list items. - Add a `handle_standard_footnotes` pass within the parser iterator, which will be responsible for creating a data structure representing the set of footnote definitions, which can be iterated over once the rest of the parser has been exhausted.
6313256
to
0fbad75
Compare
It would be nice to avoid missing a case. (I expect the compiler will inline this if appropriate, but can come back to that later if there's actually a reason to explicitly inline it).
Here's a question: what should we emit for this? What up my young youngs.[^1] ATP theme.[^1] Neato bandito.[^1]
[^1]: A silly song. I actually really like the decision GitHub has made here: "naming" (by number) the return link: Rendered markup, stripped of GH-specific bits: <p dir="auto">What up my young youngs.<sup><a href="#user-content-fn-1" id="user-content-fnref-1" aria-describedby="footnote-label">1</a></sup> ATP theme.<sup><a href="#user-content-fn-1" id="user-content-fnref-1-2" aria-describedby="footnote-label">1</a></sup></p>
<section class="footnotes"><h2 id="footnote-label" class="sr-only">Footnotes</h2>
<ol dir="auto">
<li id="user-content-fn-1">
<p dir="auto">A silly song. <a href="#user-content-fnref-1" class="data-footnote-backref" aria-label="Back to content">↩</a> <a href="#user-content-fnref-1-2" class="data-footnote-backref" aria-label="Back to content">↩<sup>2</sup></a></p>
</li>
</ol>
</section> markdown-it names the reference itself but not the return link, which is quite unhelpful IMO. Potentially we could do both: if there are multiple references to the same link, match the name of the reference and the return value so that people can keep track of it. |
Hey Chris, big thanks for creating this pull request and starting the work on giving the footnotes a facelift. I have gone through most of your posts here and I think many of your considerations are sound. I am very happy to see that you've familiarized yourself with previous discussions we have had, so you're not missing anything that has already been tried or ruled out. I don't have time right now to go through the changes you've made so far, but I wanted to let you know that we are very excited to have someone work on this, so that we can bring the footnotes implementation in line with other parsers. I hope to get to it in the next few days! |
No worries on the delay; as you can tell I have significant delaying factors myself. One of the things that I noticed when working through and evaluating next steps is that we would basically have to rewrite the tree every time we encounter a footnote definition, carrying around the extra state to support it. I think that's doable, but it's going to be a non-trivial amount of work to do it and get it right (and not mess up everything else 😂), so it'll likely be my next investment day (Dec. 10th) before I keep moving on this. |
When you mention rewriting the tree, is it for the purpose of doing backreferences? That does sound computationally expensive.. The way I had in mind to do backreferences is to basically cache all events, put them in a vec and capture all references to footnotes. Then, go back to the footnotes in the event vec (maybe with an index?) and add the backrefs there. It's also not the best from a performance standpoint, but it may be quicker. But it does force us to consume all events in one go... Now that I think of it, indeed, updating the tree in place may be preferable. Especially if we can be a bit smart about looking up the footnote definitions. I am curious what you'll come up with! |
Yeah, there are two specific things which make it particularly tricky:
It may be possible to simply push definitions into another data structure, and to also keep track of the order in which references are emitted, and that would be preferable in terms of costs I think—but it wasn't totally obvious to me how I would pull that off. I’ll be pushing at this again on my investment time this Friday, so I’ll see how far I get with one or the other approaches. In sum, though, “standard” Markdown footnotes are the kind of thing which more or less require you to make a full pass for them, as far as I can tell, and there's not quite enough information from the first pass to do it as things stand. |
Add one callback each for: - footnotes references without a corresponding definition - footnote definitions which are entirely unreferenced For both, introduce corresponding types (which will also be useful later when actually doing the tree rewriting). Given the additional complexity this requires, instead of adding yet more `new_with_*` variations, introduce a `ParserBuilder` which allows a user to set up a `Parser` instance like so: let parser = ParserBuilder(text) .with_options(some_options). .with_broken_link_callback(a_broken_link_callback) .with_broken_footnote_callback(a_broken_footnote_callback) .build(); (This can also work for the existing cases, but they are left as is for this change.) To enable making the builder robust, extract a new type alias, `BrokenLinkCallbackFn`, representing the type which is wrapped in an `Option` by the existing `BrokenLinkCallback` type, and do the same for the two callbacks allowed.
Emitting "standard" footnotes requires rewriting the tree by moving the body of each definition to an appropriately-ordered list at the *end* of the tree, rather than inline in the body of the document. Accordingly, introduce two new data structures in the parser module: - `ParsedFootnotes`: a `struct` the parser can push definitions into as it traverses the tree and emits items initially. When encountering a footnote definition or a footnote reference, it pushes them into the appropriate field on the `ParsedFootnotes` type: - a `HashMap` for the definitions, where the key is the definition name and the value its body - a `Vec` of references, so that the order which footnote references appear in the body of the text is maintained so that they can be numbered appropriately and - `EmittableFootnotes`: a `struct` with three fields: - a `Vec` of footnote definitions used in the text, consisting of: - the `Vec<Item>` making up the body of the definition - the corresponding `Vec` of back-reference names, which can be used to generate back-reference links - a `Vec` of undefined references, which can be normalized, warned on, or any other option using the callback introduced in the previous commit - a `HashMap` of definition names and bodies for unused footnote definitions, which can likewise be This provides all the information needed to correctly emit footnotes definitions in the correct order and with the correct structure at the end of the parse tree. (It does *not* include enough information to emit the correct numbers for footnote reference text, but I *believe* the existing parser infrastructure already handles that.)
All right, after a looooong hiatus I have picked this back up, and I wanted to note a couple significant things that I did in the course of this evening's work. (These are pulled directly from the commit messages, but surfaced here for easier discussion.)
I don't know whether this burst of energy will continue in the week ahead, but I'm hoping so: I would really like to get this across the line sometime soon so that I can actually use this library both for my own projects and in the context of Zola! |
Oooh, happy to see more progress on this. I will try to take some time this week to have a first look at the code! |
- Rewrite it as a `for` loop, which is a bit clearer than a `fold` in this case. - Actually store indices in the `used_idx`: the fact that it wasn't mutable before was a clue that something was wrong!
Okay, I took some time before work this morning and saw how far I could get with the data structures I have, and the answer is: they have all the data we need (:tada:) but none of the existing infrastructure around the |
There will necessarily be two "passes", but the second pass merely (!) needs to append items to the tree and update it for continued iteration.
This isn't really *preferable*, but is probably roughly necessary as a stopgap? It's mildly annoying that as a consumer you would just emit an empty string, but as it currently stands the parser (quite reasonably!) has no way to say "nothing to see here"; it needs to actually move the cursor in the tree. So maybe that needs to be a loop instead?
What's the status on this? I would love to see this get addressed. Right now I'm seriously missing the use case described by the last 2 points listed in the OP (having the "back to text" link and collecting them all at the bottom of the text). |
Unfortunately, it's rather involved and I haven't had a sustained block to push on it. If someone else picks it up and runs with it I will not be sad. I still hope to come back to it eventually but realistically it could be months, years, or never. |
|
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm] [^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm][^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm] [^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm] [^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm] [^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 Resolves pulldown-cmark#623 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm] [^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
Resolves pulldown-cmark#20 Resolves pulldown-cmark#530 Resolves pulldown-cmark#623 This change is similar to, but a more limited change than, <pulldown-cmark#544>. It changes the syntax, but does not touch the generated HTML or event API. Motivation ---------- This commit is written with usage in mdBook, rustdoc, and docs.rs in mind. * Having a standard to follow, or at least a public test suite in [cmark-gfm] [^c], makes it easier to distinguish bugs from features. * It makes sense to commit to following GitHub's behavior specifically, because mdBook chapters and docs.rs README files are often viewed in GitHub preview windows, so any divergence will be very annoying. * If mdBook and docs.rs are going to use this syntax, then rustdoc should, too. * Having both footnote syntaxes use the same API and rendering makes it more feasible for rustdoc to change the syntax over an [edition]. To introduce a syntax change in a new edition of Rust, we must make rustdoc warn anyone who writes code that will have its meaning change. To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES` on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if they diverge. * Alternatively, run a Crater build with this same code to check if this actually causes widespread breakage. * In particular, using tree rewriting to push the footnotes to the end is not as useful as it sounds, since that's not enough to exactly copy the way GitHub renders footnotes. To do that, you also need to sort the footnotes by the order in which they are *referenced*, not the order in which they are defined. This type of tree rewriting is also a waste of time if you want "margin note" rendering instead of putting them all at the end. [cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702 [edition]: https://doc.rust-lang.org/edition-guide/editions/index.html [^c]: cmark-gfm is under the MIT license, so incorporating parts of its test suite into pulldown-cmark should be fine.
It would be nice to have specified here the differences with #654, thanks. |
I think the very short answer there @Martin1887 is: just use @notriddle's stuff. I think everything I was trying to do here can be built on top of those changes, and this can be treated as superseded by that. As I noted over on #654, the approach I was taking here ends up being more constraining than what we actually want or need, and while it was well-motivated in one sense, I ended up coupling together a couple of concerns here in terms of the structure of a footnote definition and how it gets emitted. As noted above, I think the emit pass has to “collect” information as it goes—
@marcusklaas suggested keeping track of it in a @notriddle's approach makes that perfectly possible, though, and the difference ends up being all in the Net: I am closing this in favor of that completed work, with a huge sigh of relief because I wasn't sure I was ever going to finish it and was sure that I was going to throw away a huge chunk of it in favor of doing something more along the lines of #654! |
(I'll update this PR description over time as I add to it. For now, it's just the description of the first commit: tests!)
Introduce tests for standard footnotes
Starting with the existing footnotes test suite, write tests for an implementation which matches the behavior of footnotes from other major implementors, including e.g. markdown-it and GitHub-Flavored Markdown. These "standard" footnotes:
↩︎
character, which includearia-label="back to content"
androle="doc-backlink"
to support better screen-reader feedbackhr
Additionally, these footnote links and back-link are disambiguated from other possible in-document links (e.g. heading links) by incorporating
fn-
at the start of the link name.Finally, I fixed the descriptive text of the existing footnotes' spec for the multiple-paragraphs case, because it does not in fact support multiple paragraphs!