-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix autolink literals in link labels #6
Conversation
Awesome! Thanks for working on this! I do believe the issue is a bit more complex (PS props for figuring out how this works, because it isn‘t really documented yet): |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
@wooorm I added a test for
and it does seem to work as expected. I kind of expected that https://github.com/micromark/micromark/blob/5a90585c1ba8414f05888329265440c1aa112311/lib/tokenize/label-end.mjs#L42 intended to splice away the event for the I see what you mean though, a potential problem would be:
So lets try how github does: compared to: |
This is blowing my mind! I knew github was weird, but this is weird!!! |
I would guess it is for the same reason as for this PR behaving like that. :-) I guess if you want to support the "is a nice char" thingy anyway, then I could have a stab at it. What do you say? |
It gets weirder:
-> Link start. [https://a.com [http://a.com [www.a.com Link start and label end. [https://a.com] [http://a.com] [www.a.com] [a@b.c] Link label with reference. Link label with resource. Autolink literal after link. but, as a gist (probably as a readme), it’s turned into: https://gist.github.com/wooorm/8199cc9e70e37e01b8470293c119fb28 |
Alright, so I’m thinking about this, and it’s going to get a bit complex: First, it’s not just plain links, it’s also images, and perhaps even Second, we don’t know if something is a Perhaps, it’s just impossible to know in this project, so maybe it could be added in micromark? |
Have been digging in some more. I’m now under the assumption that cmark-gfm is used by comments (and probably /issues/prs/releases), where the naïve behavior of this PR is used, through these lines: https://github.com/github/cmark-gfm/blob/85d895289c5ab67f988ca659493a64abb5fec7b4/src/inlines.c Which are used here: But, the actual behavior on Gists (and probably readmes and such) uses a different compiler, is more complex as stated before. It seems that the behavior of autolink literal is done after parsing, similar probably to mentions (similar to mentions: https://github.com/remarkjs/remark-github), because for example in That does make it harder to handle the “final character reference” case described in the spec: www.example.com/b&](#)
www.example.com/b](#)&
[www.example.com/b&](#)
[www.example.com/b](#)& ^-- which Is behavior which I don’t get 🤷♂️ |
Some more behavior that I can’t wrap my head around: https://gist.github.com/wooorm/fcacba9afeffcdcd0b1c6bf0e74e598f So, the opening bracket does have some affect. Even though it won’t end up forming an actual link. |
I think I finally found something: GH uses two algorithms. One at parsing, and one that matches on the AST. Take this example: https://gist.github.com/wooorm/580c02085cd9c96d3c3c2a31f1cd1e5c Note that when the bracket count is balanced (all the open ones are closed), the character reference does not work (is not decoded): the tokenizer just treats it as more raw characters. My assumption is this tokeniser doesn‘t do anything if there are open brackets. And instead that there’s some more final processing, because the character reference works (is decoded) in the more-open-brackets-case. |
Related to micromark/micromark-extension-gfm-autolink-literal#5. Related to micromark/micromark-extension-gfm-autolink-literal#6. Closes GH-4. Closes remarkjs/remark-gfm#16.
For markdown such as:
The following HTML is generated:
With the fix instead it behaves as github.com