Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fancy lists, task list and bunch of pandoc extensions #36

Merged
merged 21 commits into from
Aug 4, 2022

Conversation

Omikhleia
Copy link
Contributor

Greetings,
The SILE typesetter has been using (its own local version of) lunamark as one of the available options to process Markdown. Besides being based on an old version, it was in a pretty broken shape... As part of an attempt to revive "native Mardown support" in the SILE with decent extensions, I am proposing a few changes to lunamark, which I am therefore relaying upstream here.

In other terms, this PR contains the following things:

  • The very first commit is a very small change so that a writer can override the text-only output and process the AST ropes on its own. Rationale: SILE currently uses it to directly hook its custom writer onto its own AST (and thus avoids outputting to text and re-parsing all of it...).
  • Subsequent commits consists in a bunch of extensions to the reader:
    • Under the task_list option
      • GitHub-Flavored Markdown's task lists as first-class citizens.
    • Under the fancy_lists option
      • Similar to the Pandoc option by the same name, i.e. offering the ability to use a roman number or a letter as list enumerators (in addition to the standard digit), and to use the closing parenthesis as delimiter (in addition to the period), e.g. besides the normal 1., one can use i., a., a), etc.
      • Obviously, it plays well too when the startnum option is enabled
      • Writers may then honor it (or not), but at least these lists can be processed (even if rendered as usual lists).
    • Under the pandoc_extensions option:
      • Strikethrough: ~~deleted~~
      • Subscripts and superscripts, e.g. respectively H~2~O, 2^10^
      • Inline spans with attributes: [text]{.class .other key=value key2="value2" ... }
      • Block divs with attributes, via fenced colon blocks: ::: {.class key=value ... } ...
      • Attributes also accepts on images (![](image.png}{ key=value... }) and on fenced block codes (when the corresponding option for them is enabled)
      • Raw inlines and raw blocks: {=xxx ...} after a simple inline code (i.e. backticks) or as "extended" infostring on a fenced code block (idem, when those are enabled) -- content skipped by default, but writers can use them to pass-through direct code to their underlying formatter.
  • On the way, I also leveraged the HTML and HTML5 writers to support some of these features. (Something similar could be done possibly for other writers, but I don't know their targets well enough...).

There are other features I'd possibly want to add at a later point, but it would be great and really cool, if this PR is considered and eventually makes it (in that form or another). Moreover, that would also help SILE removing its local copy/fork of the library and use the official package again! 👍

Thanks in advance for your time and consideration.

Reference: sile-typesetter/sile#1481
(Just for the record. It is (mostly) the same thing, plus the in-progress changes for SILE itself...)

@Omikhleia
Copy link
Contributor Author

N.B. Extension-related features currently left out for separate consideration... so you know what could be my wish-list after all the above stuff... In order of importance (from my viewpoint):

  • Attributes on headers, e.g.
    # My chapter {-}
    ## Ma section {.unnumbered custom-style=Fancy lang=fr}
    # Some chapter {#alternate-hash-for-links}
    
  • Line blocks (= Pandoc's line_blocks)
    | Typically for some verse
    |  With indentation support
    |   As shown here
    
  • Some tables.. see Content slicing, table support, and miscellaneous fixes #31
  • Example lists (Pandoc's example_lists), a.k.a. continuous list with (@)
  • Nesting fenced divs as in Pandoc. The current proposed implementation doesn't support correct nesting as with Pandoc's fenced_divs option. Actually I didn't know these could even be nested, I always used the most basic form such as
    ::: {lang=fr}
    > Some quote in French
    > Lorem
    :::
    
    So I went for that low-hanging fruit which already helps covering a lot, but a better implementation would eventually be welcome.

@Witiko
Copy link
Collaborator

Witiko commented Jul 18, 2022

  • Attributes also accepts on images (![](image.png}{ key=value... }) and on fenced block codes (when the corresponding option for them is enabled)

@Omikhleia This is an impressive list of syntax extension with tens of manhours behind it no doubt. Do I understand it correctly that this PR implements the image part of the Pandoc's link_attributes extension but not the link part?

@Omikhleia
Copy link
Contributor Author

@Witiko

Do I understand it correctly that this PR implements the image part of the Pandoc's link_attributes extension but not the link part?

That's correct, the #id linking hash is not here yet -- the parser only supports classes and key-value pairs for now, in that order.

The hash part could easily be added, though. That will also be needed anyway for header attributes (as in my wish list above)... But again, I first went for the easy-to-reach "low-hanging fruits", in that case the ability to pass e.g. a width and height to the processor, to size the images accordingly. I will want eventually to have a look at linking, but my cross-reference package proposal for SILE is still a pending PR ;-)

(Besides, the library has not seen patches and releases from some time now... I have no idea if @jgm still maintains it?)

@jgm
Copy link
Owner

jgm commented Jul 18, 2022

I'll defer to @Witiko who is the most active contributor.

@Witiko
Copy link
Collaborator

Witiko commented Jul 18, 2022

@jgm Not a problem. I will want to port these changes to witiko/markdown, so a code review is the logical first step.

@Witiko
Copy link
Collaborator

Witiko commented Jul 18, 2022

That's correct, the #id linking hash is not here yet -- the parser only supports classes and key-value pairs for now, in that order.

My question was mainly about attributes on links (i.e. [a hypertext link](https://hypertext.link){.some-class}) as opposed to images. A quick glance through the code indicates that links are not supported. However, I will do a full code review on Thurstday, so we can discuss it then.

@Witiko Witiko self-requested a review July 21, 2022 12:58
bin/lunamark Outdated Show resolved Hide resolved
lunamark/util.lua Outdated Show resolved Hide resolved
lunamark/util.lua Outdated Show resolved Hide resolved
@Witiko Witiko force-pushed the improvements-from-sile branch from 886b14c to da808d1 Compare July 25, 2022 22:41
@Witiko Witiko force-pushed the improvements-from-sile branch from da808d1 to d8ec95b Compare July 25, 2022 22:46
Copy link
Collaborator

@Witiko Witiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks again for all your hard work. In 886b14c, I added style checking to the CI and in d8ec95b, I fixed all warnings emitted by the style checker. This should help us with quality control in future pull requests.

@jgm I also added lunamark-0.6-1.rockspec and updated changelog. After the merge, can you please update not yet released in changelog, tag the current commit as 0.6.0, and publish version 0.6.0 to Luarocks?

@Witiko Witiko linked an issue Jul 25, 2022 that may be closed by this pull request
Copy link
Collaborator

@Witiko Witiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a number of style issues as I am porting the bracketed_spans syntax extension in Witiko#207. Let's collate these into a separate large pull request sometime later.

Comment on lines +1123 to +1125
larsers.Span = ( parsers.between(parsers.Inline, parsers.lbracket,
parsers.rbracket) ) * ( parsers.attributes )
/ writer.span
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
larsers.Span = ( parsers.between(parsers.Inline, parsers.lbracket,
parsers.rbracket) ) * ( parsers.attributes )
/ writer.span
larsers.Span = parsers.between(parsers.Inline, parsers.lbracket, parsers.rbracket)
* parsers.attributes
/ writer.span

Comment on lines +1129 to +1130
larsers.Subscript = ( parsers.between(larsers.subsuperscripttext, parsers.tilde, parsers.tilde) )
/ writer.subscript
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
larsers.Subscript = ( parsers.between(larsers.subsuperscripttext, parsers.tilde, parsers.tilde) )
/ writer.subscript
larsers.Subscript = parsers.between(larsers.subsuperscripttext, parsers.tilde, parsers.tilde)
/ writer.subscript

Comment on lines +1132 to +1133
larsers.Superscript = ( parsers.between(larsers.subsuperscripttext, parsers.circumflex, parsers.circumflex) )
/ writer.superscript
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
larsers.Superscript = ( parsers.between(larsers.subsuperscripttext, parsers.circumflex, parsers.circumflex) )
/ writer.superscript
larsers.Superscript = parsers.between(larsers.subsuperscripttext, parsers.circumflex, parsers.circumflex)
/ writer.superscript

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants