Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFM autolink extension (www., https?:// parts): links don’t work when after bracket #278

Open
wooorm opened this issue Sep 2, 2022 · 0 comments

Comments

@wooorm
Copy link

wooorm commented Sep 2, 2022

Problem

Consider:

x www.example.com

[ www.example.com

![ www.example.com

[^ www.example.com

[] www.example.com

---

x https://example.com

[ https://example.com

![ https://example.com

[^ https://example.com

[] https://example.com

---

x contact@example.com

[ contact@example.com

![ contact@example.com

[^ contact@example.com

[] contact@example.com

This is currently rendered as:

x www.example.com

[ www.example.com

![ www.example.com

[^ www.example.com

[] www.example.com


x https://example.com

[ https://example.com

![ https://example.com

[^ https://example.com

[] https://example.com


x contact@example.com

[ contact@example.com

![ contact@example.com

[^ contact@example.com

[] contact@example.com


The reason for this is that, for performance reasons, GH has two algorithms to parse its autolink extension: www. and https?:// are handled when parsing, emails are handled when postprocessing.

One solution

One solution for this problem, is to perform everything when postprocessing (just like the new mailto and xmpp protocols).
However, postprocessing has problems: it does not consider character escapes or character references:

contact\@example.com

contact@example.com

contact@example.com

Yields:

contact@example.com

contact@example.com

contact@example.com


These are examples of someone trying to prevent an email from being linked, using methods that work in the rest of markdown, but GFM ignores that.
A similar problem exists for math on GitHub, which some users are unhappy about, and results in weird and unintuitive ways to escape it.

A better solution

I think it should be possible to either:

  • use a different node type than CMARK_NODE_LINK in extensions/autolink.c
  • add a field to CMARK_NODE_LINK, to differentiate extension links from “normal” links

Then, update the extension to not exit when in a bracket.

Finally, when compiling, to output just the URL (not in a link) of an extension autolink, when already inside a link.

This solution, does come with an additional problem, but that can be mitigated.
Because URL parsing is so loose, it matches ](xxx). For example:

a www.example.com](www.example.org) b

[ a www.example.com](www.example.org) b

Yields:

a www.example.com](www.example.org) b

a www.example.com b

It can be mitigated by, when seeing ], stopping if the next character is ( (or [)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant