Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: support “standard” footnotes #544

Closed
wants to merge 22 commits into from

Conversation

chriskrycho
Copy link

@chriskrycho chriskrycho commented Nov 19, 2021

(I'll update this PR description over time as I add to it. For now, it's just the description of the first commit: tests!)


Introduce tests for standard footnotes

Starting with the existing footnotes test suite, write tests for an implementation which matches the behavior of footnotes from other major implementors, including e.g. markdown-it and GitHub-Flavored Markdown. These "standard" footnotes:

  • are block items which behave like lists, and may have arbitrary blocks nested within them
  • include back-reference links, using the now-standard ↩︎ character, which include aria-label="back to content" and role="doc-backlink" to support better screen-reader feedback
  • appear in an ordered list at the end of the document, rather than inline in the flow of the text, which list is preceded by an hr

Additionally, these footnote links and back-link are disambiguated from other possible in-document links (e.g. heading links) by incorporating fn- at the start of the link name.

Finally, I fixed the descriptive text of the existing footnotes' spec for the multiple-paragraphs case, because it does not in fact support multiple paragraphs!

Starting with the existing footnotes test suite, write tests for an
implementation which matches the behavior of footnotes from other major
implementors, including e.g. markdown-it and GitHub-Flavored Markdown.
These "standard" footnotes:

- are block items which behave like lists, and may have arbitrary blocks
  nested within them
- include back-reference links, using the now-standard `↩︎` character,
  which include `aria-label="back to content"` and `role="doc-backlink"`
  to support better screen-reader feedback
- appear in an ordered list at the end of the document, rather than
  inline in the flow of the text, which list is preceded by an `hr`

Additionally, these footnote links and back-link are disambiguated from
other possible in-document links (e.g. heading links) by incorporating
`fn-` at the start of the link name.

Finally, I fixed the descriptive text of the existing footnotes' spec
for the multiple-paragraphs case, because it does not in fact support
multiple paragraphs!
@chriskrycho chriskrycho marked this pull request as draft November 19, 2021 17:22
- add an options flag to the bitflags
- treat `enable-footnotes` and `enable-standard-footnotes` as mutually
  exclusive, including an error message and the usage message if they
  are both passed
- to keep things clear, don't shadow the name `opts` in `main`:
  distinguish between the CLI options and the program options
@chriskrycho
Copy link
Author

chriskrycho commented Nov 19, 2021

An observation here – @marcusklaas @raphlinus I’d be curious to get your thoughts on this:

The current implementation of HtmlWriter::run effectively hard-codes the implementation of footnote output, and indeed all output—it more or less assumes that that output is non-conditional, and certainly not configurable. I also know that configuration of the output has been generally viewed as non-desirable in the past, and I would broadly agree with that design: people can implement their own custom writers in general, and that's fine. However, in the case of the alternative (more standard) footnote design, FootnoteDefinitions must have a different HTML representation than they do in the current output, and that can only be specified via a configuration choice as the design of pulldown-cmark stands today.

The body of the footnote is not a problem: as long as the updated parsing I'm working on correctly generates events within the block, the writer will emit them and then emit the footnote definition closing tag appropriately. It's only an issue for the open and close of the definition tag.

(As an aside, and related to #517, it's actually very helpful to have the name of the footnote in the closing tag for the more-standard footnotes design, as it makes it easy to generate the back-reference link.)

For now, to keep making forward progress, I'm going to simply update the main invocation to take some kind of configuration here, but it's definitely a thing we'd need to resolve before merging (once I get it working of course). Edit: actually, I'm supplying new with_options variants and adding the field in a way that's backwards-compatible, while changing the invocation from within the main binary.

As a bonus, fix the clippy errors and warnings in build.rs:

- use `write_all` instead of `write`
- use `unwrap_or_else` instead of `expect` (avoids allocating the
  string, and while we don't really care that much in the build script,
  doing less is always nice)
@chriskrycho
Copy link
Author

chriskrycho commented Nov 19, 2021

Hmm. It occurs to me as I'm a good chunk of the way through implementing the output that in fact the key is to just avoid emitting those events! That is, instead of emitting FootnoteDefinition, the parser should simply emit the sequence of bog-standard HTML events. For my own easy reference, esp. since it may be this time next month before I continue on (much less finish!) this 😬:

  • Event::Rule
  • Event::Html("<section class=\"footnotes\">)
  • Event::Start(Tag::List(1))
  • Event::Start(Tag::Item)
  • (body of the item, usually but not always a paragraph...)
    • if it ends with a paragraph:
      • the return <a>
      • Event::End(Tag::Paragraph)
    • otherwise, just the return <a> (see below)
  • Event::End(Tag::Item)
  • Event::End(Tag::List)
  • Event::Html("</section>")

Where the return <a> is roughly:

Event::Html(r##" <a href="#fn-<name>-ref" class="footnote-backref" aria-label="back to content" role="doc-backlink">↩︎</a>"##)

The only remaining issue is making the anchors serve as back-link targets as well. I'm continuing forward with the configuration approach for now, but will reevaluate as I make progress on the parser side.

Ultimately, it may be possible to work around this, but at present I'm
not clear on that, especially without *other* breaking changes. This
preserves the existing public API and simply adds handling for this
scenario.

One option, which *would* be a breaking change but would also address a
long-standing request (pulldown-cmark#142) is to simply add support for back-links in
*all* footnote types. In that case, there would once again be no need
for configuration. However, that may need to be an *option*, since it's
otherwise a breaking change for all existing users (including any who
may be adding those manually!).
- For now, reuse the existing implementation from `firstpass`, which
  makes debugging to see the flow of the code viable. It's also a
  more-or-less reasonable starting point for the new behavior, which
  should have the same basic rules for starting (`parse_footnote`) but
  then will need a more sophisticated notion of when to terminate the
  body, in line with the design of list items.

- Add a `handle_standard_footnotes` pass within the parser iterator,
  which will be responsible for creating a data structure representing
  the set of footnote definitions, which can be iterated over once the
  rest of the parser has been exhausted.
It would be nice to avoid missing a case. (I expect the compiler will
inline this if appropriate, but can come back to that later if there's
actually a reason to explicitly inline it).
@chriskrycho
Copy link
Author

Here's a question: what should we emit for this?

What up my young youngs.[^1] ATP theme.[^1] Neato bandito.[^1]

[^1]: A silly song.

I actually really like the decision GitHub has made here: "naming" (by number) the return link:

Screen Shot 2021-11-20 at 13 05 03

Rendered markup, stripped of GH-specific bits:

<p dir="auto">What up my young youngs.<sup><a href="#user-content-fn-1" id="user-content-fnref-1" aria-describedby="footnote-label">1</a></sup> ATP theme.<sup><a href="#user-content-fn-1" id="user-content-fnref-1-2" aria-describedby="footnote-label">1</a></sup></p>
<section class="footnotes"><h2 id="footnote-label" class="sr-only">Footnotes</h2>
<ol dir="auto">
<li id="user-content-fn-1">
<p dir="auto">A silly song. <a href="#user-content-fnref-1" class="data-footnote-backref" aria-label="Back to content"></a> <a href="#user-content-fnref-1-2" class="data-footnote-backref" aria-label="Back to content"><sup>2</sup></a></p>
</li>
</ol>
</section>

markdown-it names the reference itself but not the return link, which is quite unhelpful IMO. Potentially we could do both: if there are multiple references to the same link, match the name of the reference and the return value so that people can keep track of it.

@marcusklaas
Copy link
Collaborator

Hey Chris, big thanks for creating this pull request and starting the work on giving the footnotes a facelift. I have gone through most of your posts here and I think many of your considerations are sound. I am very happy to see that you've familiarized yourself with previous discussions we have had, so you're not missing anything that has already been tried or ruled out.

I don't have time right now to go through the changes you've made so far, but I wanted to let you know that we are very excited to have someone work on this, so that we can bring the footnotes implementation in line with other parsers. I hope to get to it in the next few days!

@chriskrycho
Copy link
Author

No worries on the delay; as you can tell I have significant delaying factors myself. One of the things that I noticed when working through and evaluating next steps is that we would basically have to rewrite the tree every time we encounter a footnote definition, carrying around the extra state to support it. I think that's doable, but it's going to be a non-trivial amount of work to do it and get it right (and not mess up everything else 😂), so it'll likely be my next investment day (Dec. 10th) before I keep moving on this.

@marcusklaas
Copy link
Collaborator

When you mention rewriting the tree, is it for the purpose of doing backreferences? That does sound computationally expensive..

The way I had in mind to do backreferences is to basically cache all events, put them in a vec and capture all references to footnotes. Then, go back to the footnotes in the event vec (maybe with an index?) and add the backrefs there. It's also not the best from a performance standpoint, but it may be quicker. But it does force us to consume all events in one go...

Now that I think of it, indeed, updating the tree in place may be preferable. Especially if we can be a bit smart about looking up the footnote definitions. I am curious what you'll come up with!

@chriskrycho
Copy link
Author

Yeah, there are two specific things which make it particularly tricky:

  1. Backreferences, for the reasons you covered.
  2. The existing approach wants to just emit the stream of events as it builds them from the tree, which is lovely for consumers, but that means that somehow you have to build up a stream of extra events to handle at the end, consisting of footnote definitions which will be emitted as list items which also track reference order.

It may be possible to simply push definitions into another data structure, and to also keep track of the order in which references are emitted, and that would be preferable in terms of costs I think—but it wasn't totally obvious to me how I would pull that off. I’ll be pushing at this again on my investment time this Friday, so I’ll see how far I get with one or the other approaches.

In sum, though, “standard” Markdown footnotes are the kind of thing which more or less require you to make a full pass for them, as far as I can tell, and there's not quite enough information from the first pass to do it as things stand.

Add one callback each for:

- footnotes references without a corresponding definition
- footnote definitions which are entirely unreferenced

For both, introduce corresponding types (which will also be useful
later when actually doing the tree rewriting).

Given the additional complexity this requires, instead of adding yet
more `new_with_*` variations, introduce a `ParserBuilder` which allows
a user to set up a `Parser` instance like so:

    let parser = ParserBuilder(text)
        .with_options(some_options).
        .with_broken_link_callback(a_broken_link_callback)
        .with_broken_footnote_callback(a_broken_footnote_callback)
        .build();

(This can also work for the existing cases, but they are left as is for
this change.)

To enable making the builder robust, extract a new type alias,
`BrokenLinkCallbackFn`, representing the type which is wrapped in an
`Option` by the existing `BrokenLinkCallback` type, and do the same for
the two callbacks allowed.
Emitting "standard" footnotes requires rewriting the tree by moving the
body of each definition to an appropriately-ordered list at the *end*
of the tree, rather than inline in the body of the document.
Accordingly, introduce two new data structures in the parser module:

- `ParsedFootnotes`: a `struct` the parser can push definitions into as
  it traverses the tree and emits items initially. When encountering a
  footnote definition or a footnote reference, it pushes them into the
  appropriate field on the `ParsedFootnotes` type:

    - a `HashMap` for the definitions, where the key is the definition
      name and the value its body

    - a `Vec` of references, so that the order which footnote references
      appear in the body of the text is maintained so that they can be
      numbered appropriately and

- `EmittableFootnotes`: a `struct` with three fields:

    - a `Vec` of footnote definitions used in the text, consisting of:

        - the `Vec<Item>` making up the body of the definition

        - the corresponding `Vec` of back-reference names, which can be
          used to generate back-reference links

    - a `Vec` of undefined references, which can be normalized, warned
      on, or any other option using the callback introduced in the
      previous commit

    - a `HashMap` of definition names and bodies for unused footnote
      definitions, which can likewise be

This provides all the information needed to correctly emit footnotes
definitions in the correct order and with the correct structure at the
end of the parse tree. (It does *not* include enough information to
emit the correct numbers for footnote reference text, but I *believe*
the existing parser infrastructure already handles that.)
@chriskrycho
Copy link
Author

All right, after a looooong hiatus I have picked this back up, and I wanted to note a couple significant things that I did in the course of this evening's work. (These are pulled directly from the commit messages, but surfaced here for easier discussion.)


  1. I introduced callbacks to handle broken footnotes. I think we need this minimally to be able to provide useful feedback when there are unused definitions or undefined references.

    Given the additional complexity this requires, instead of adding yet more new_with_* variations, introduce a ParserBuilder which allows a user to set up a Parser instance like so:

    let parser = ParserBuilder(text)
        .with_options(some_options).
        .with_broken_link_callback(a_broken_link_callback)
        .with_broken_footnote_callback(a_broken_footnote_callback)
        .build();

    (This can also work for the existing cases, but they are left as is for this change.)

    To enable making the builder robust, I extracted a new type alias, BrokenLinkCallbackFn, representing the type which is wrapped in an Option by the existing BrokenLinkCallback type, and did the same for the two new callbacks allowed.

    Obviously this marks a relatively significant change to the public API, but I think it's reasonably well-warranted, and it is totally backwards compatible.

  2. Emitting "standard" footnotes requires rewriting the tree by moving the body of each definition to an appropriately-ordered list at the end of the tree, rather than inline in the body of the document. Accordingly, I introduced two new data structures in the parser module:

    • ParsedFootnotes: a struct the parser can push definitions into as it traverses the tree and emits items initially. When encountering a footnote definition or a footnote reference, it pushes them into the appropriate field on the ParsedFootnotes type:

      • a HashMap for the definitions, where the key is the definition name and the value its body

      • a Vec of references, so that the order which footnote references appear in the body of the text is maintained so that they can be numbered appropriately and

    • EmittableFootnotes: a struct with three fields:

      • a Vec of footnote definitions used in the text, consisting of:

        • the Vec<Item> making up the body of the definition

        • the corresponding Vec of back-reference names, which can be used to generate back-reference links

      • a Vec of undefined references, which can be normalized, warned on, or any other option using the callback introduced in the previous commit

      • a HashMap of definition names and bodies for unused footnote definitions, which can likewise be

    This provides all the information needed to correctly emit footnotes definitions in the correct order and with the correct structure at the end of the parse tree. (It does not include enough information to emit the correct numbers for footnote reference text, but I believe the existing parser infrastructure already handles that?)

    Now, the big thing to note here is the data structures. I tried to make this as lean as possible, using only moves and using a HashMap<CowStr<'a>, usize> to index into the list of used footnotes while building up the list to emit, but I would very much welcome feedback on that point.


I don't know whether this burst of energy will continue in the week ahead, but I'm hoping so: I would really like to get this across the line sometime soon so that I can actually use this library both for my own projects and in the context of Zola!

@marcusklaas
Copy link
Collaborator

Oooh, happy to see more progress on this. I will try to take some time this week to have a first look at the code!

- Rewrite it as a `for` loop, which is a bit clearer than a `fold` in
  this case.
- Actually store indices in the `used_idx`: the fact that it wasn't
  mutable before was a clue that something was wrong!
@chriskrycho
Copy link
Author

Okay, I took some time before work this morning and saw how far I could get with the data structures I have, and the answer is: they have all the data we need (:tada:) but none of the existing infrastructure around the Iterator for Parser is ready for this short of actually rewriting the end of the tree. Unfortunately, I do not yet see a clear path to doing that, because that also involves rewriting the allocations. (I’m sure it’s possible, I just don’t yet have enough of the context loaded to see where and how I need to do that.) Also, there are a bunch of ownership issues I need to chase through around the iterator lifetimes, but that should be relatively tractable.

There will necessarily be two "passes", but the second pass merely (!)
needs to append items to the tree and update it for continued iteration.
This isn't really *preferable*, but is probably roughly necessary as a
stopgap? It's mildly annoying that as a consumer you would just emit an
empty string, but as it currently stands the parser (quite reasonably!)
has no way to say "nothing to see here"; it needs to actually move the
cursor in the tree. So maybe that needs to be a loop instead?
@glhrmv
Copy link

glhrmv commented Oct 7, 2022

What's the status on this? I would love to see this get addressed. Right now I'm seriously missing the use case described by the last 2 points listed in the OP (having the "back to text" link and collecting them all at the bottom of the text).

@chriskrycho
Copy link
Author

chriskrycho commented Oct 7, 2022

Unfortunately, it's rather involved and I haven't had a sustained block to push on it. If someone else picks it up and runs with it I will not be sad. I still hope to come back to it eventually but realistically it could be months, years, or never.

@alerque
Copy link
Contributor

alerque commented Oct 8, 2022

s/radically/realistically/

notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 1, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm] [^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 1, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm][^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 1, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm] [^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 1, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm] [^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 1, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm] [^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 2, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

Resolves pulldown-cmark#623

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm] [^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
notriddle added a commit to notriddle/pulldown-cmark that referenced this pull request Jun 2, 2023
Resolves pulldown-cmark#20

Resolves pulldown-cmark#530

Resolves pulldown-cmark#623

This change is similar to, but a more limited change than,
 <pulldown-cmark#544>. It changes
the syntax, but does not touch the generated HTML or event API.

Motivation
----------

This commit is written with usage in mdBook, rustdoc, and docs.rs
in mind.

* Having a standard to follow, or at least a public test suite in
  [cmark-gfm] [^c], makes it easier to distinguish bugs from features.
* It makes sense to commit to following GitHub's behavior specifically,
  because mdBook chapters and docs.rs README files are often viewed in
  GitHub preview windows, so any divergence will be very annoying.
* If mdBook and docs.rs are going to use this syntax, then rustdoc
  should, too.
* Having both footnote syntaxes use the same API and rendering makes it
  more feasible for rustdoc to change the syntax over an [edition].
  To introduce a syntax change in a new edition of Rust, we must make
  rustdoc warn anyone who writes code that will have its meaning change.
  To do it, run the parser twice in lockstep (with `ENABLE_FOOTNOTES`
  on one parser, and `ENABLE_GFM_FOOTNOTES` on the other), and warn if
  they diverge.
  * Alternatively, run a Crater build with this same code to check if
    this actually causes widespread breakage.
* In particular, using tree rewriting to push the footnotes to the end
  is not as useful as it sounds, since that's not enough to exactly
  copy the way GitHub renders footnotes. To do that, you also need to
  sort the footnotes by the order in which they are *referenced*, not
  the order in which they are defined. This type of tree rewriting is
  also a waste of time if you want "margin note" rendering instead of
  putting them all at the end.

[cmark-gfm]: https://github.com/github/cmark-gfm/blob/1e230827a584ebc9938c3eadc5059c55ef3c9abf/test/extensions.txt#L702
[edition]: https://doc.rust-lang.org/edition-guide/editions/index.html

[^c]: cmark-gfm is under the MIT license, so incorporating parts of its
    test suite into pulldown-cmark should be fine.
@Martin1887
Copy link
Collaborator

It would be nice to have specified here the differences with #654, thanks.

@chriskrycho
Copy link
Author

I think the very short answer there @Martin1887 is: just use @notriddle's stuff. I think everything I was trying to do here can be built on top of those changes, and this can be treated as superseded by that. As I noted over on #654, the approach I was taking here ends up being more constraining than what we actually want or need, and while it was well-motivated in one sense, I ended up coupling together a couple of concerns here in terms of the structure of a footnote definition and how it gets emitted.

As noted above, I think the emit pass has to “collect” information as it goes—

Yeah, there are two specific things which make it particularly tricky:

  1. Backreferences, for the reasons you covered.
  2. The existing approach wants to just emit the stream of events as it builds them from the tree, which is lovely for consumers, but that means that somehow you have to build up a stream of extra events to handle at the end, consisting of footnote definitions which will be emitted as list items which also track reference order.

It may be possible to simply push definitions into another data structure, and to also keep track of the order in which references are emitted, and that would be preferable in terms of costs I think—but it wasn't totally obvious to me how I would pull that off. I’ll be pushing at this again on my investment time this Friday, so I’ll see how far I get with one or the other approaches.

In sum, though, “standard” Markdown footnotes are the kind of thing which more or less require you to make a full pass for them, as far as I can tell, and there's not quite enough information from the first pass to do it as things stand.

@marcusklaas suggested keeping track of it in a Vec or similar, and after further stumbling attempts both here and with markdown-rs, I actually think that exact strategy will work, as long as we handle broken footnote links exactly the same way we do broken links in general; otherwise you end up with broken in-document links for cases where you have a footnote reference but no corresponding definition.

@notriddle's approach makes that perfectly possible, though, and the difference ends up being all in the push_html()/write_html() side of things, which will (I think!) want to add support for this capability so it's easy for people to get the behavior without reimplementing it themselves.

Net: I am closing this in favor of that completed work, with a huge sigh of relief because I wasn't sure I was ever going to finish it and was sure that I was going to throw away a huge chunk of it in favor of doing something more along the lines of #654!

@chriskrycho chriskrycho closed this Jun 6, 2023
@chriskrycho chriskrycho deleted the footnotes branch June 6, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Footnote back-references Document footnote syntax Footnote definition does not end
5 participants