-
-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences in URL Encoding for links, text and Ids #380
Comments
I believe the HTML you are seeing is correct and it works on regular browsers, If you look at the html source GitHub is using you can see that it also encodes the href while leaving the characters in the id as-is.
The escaping in the href is done according to the CommonMark spec and I believe these characters should be escaped.
I would recommend using the AutoLink functionality of When using the default Since you are using the GitHub way of generating the id, non-ascii characters are preserved in the id and then escaped in the href.
Does the preview renderer you are using correctly establish the link if the html looks like the following - with the value of the heading id also url encoded, thus matching the href? <li><a href="#namenskonventionen-f%C3%BCr-forms">Namenskonventionen für Forms</a</li>
<h3 id="namenskonventionen-f%C3%BCr-forms">Namenskonventionen für Forms</h3> If that does work in your use-case, an extra setting controlling whether heading IDs are URL-encoded could be exposed, off by default (I tested it locally and the change needed is rather trivial). |
I am already using the AutoLinks pipeline extension and that's how the ID gets generated, but as mentioned they are not getting URL encoded the same way. Note, that there is some URL encoding happening. Checked out your example, and sure enough, even if the URL encoding matches it doesn't work, so encoding isn't a solution either. doesn't work: <li><a href="#namenskonventionen-f%C3%BCr-forms">Namenskonventionen für Forms</a</li>
<h3 id="namenskonventionen-f%C3%BCr-forms">Namenskonventionen für Forms</h3> this works: <li id="pragma-line-0"><a href="#namenskonventionen-fuer-forms">Namenskonventionen für Forms</a></li>
<h3 id="namenskonventionen-fuer-forms">Namenskonventionen für Forms</h3> this also works: <li id="pragma-line-0"><a href="#namenskonventionen-für-forms">Namenskonventionen für Forms</a></li>
<h3 id="namenskonventionen-für-forms">Namenskonventionen für Forms</h3> So it looks like if the link umlaut is URL Encoded the navigation just doesn't work. Note although I'm using a tool for previewing (Markdown Monster) which uses the IE WebBrowser control in WPF, the same behavior happens in Chrome both with local file URLs as well as running against local Web urls. |
Sigh... more info. It looks like the In the application previewer the base tag is required in order to properly find all the related resources relative to the document. However, with the base tag the navigation fails as soon as the hash is URL encoded. No encoded characters - it works fine. I already intercept navigation of the tag and manually try to locate elements, so I guess it's possible to do a bit more work to normalize the IDs and URLs by explicitly url-decoding them, but that will then fail if somebody just dumps out the preview locally. Exports try to avoid the base tag, so that's all good and on a typical Web page there likely won't be a base tag. While I still think that it would be better to not URL encode upper Unicode characters (just for the sheer overhead of it), I think that Markdig is actually doing the right thing, and I'm dealing with a HTML DOM quirk related to the After some more thought I think we can probably close this but I'll leave it open a little longer in case somebody has any other ideas on a good way to deal with this. At the end of the day this may bite others as well - anytime there are |
Since your preview differs from the actual export, there is a way (a bit of a hack). 1. Don't manually add a link destination when refering to a header.- [Namenskonventionen für Forms](#namenskonventionen-für-forms)
+ [Namenskonventionen für Forms]
# Namenskonventionen für Forms 2. In the preview pipeline, use.UseAutoIdentifiers()
// which is the same as
.UseAutoIdentifiers(AutoIdentifierOptions.AllowOnlyAscii | AutoIdentifierOptions.AutoLink) and in the release/export pipeline, use .UseAutoIdentifiers(AutoIdentifierOptions.GitHub | AutoIdentifierOptions.AutoLink) The html will obviously differ in such a case between the pipelines, but characters like umlauts will be normalized during preview. Preview HTML looks like <p><a href="#namenskonventionen-fur-forms">Namenskonventionen für Forms</a></p>
<h1 id="namenskonventionen-fur-forms">Namenskonventionen für Forms</h1> And the release HTML stays the same <p><a href="#namenskonventionen-f%C3%BCr-forms">Namenskonventionen für Forms</a></p>
<h1 id="namenskonventionen-für-forms">Namenskonventionen für Forms</h1> 3.While this does mean markdown like this can't work in the preview as there will be normalization happening, it doesn't work right now either so I don't see this as a real regression.
|
@MihaZupan Thank you - yes that would work. However the easier solution was to modify the render script that drives the preview and already intercepts hash navigation which is inconsistent anyway due to the file based nature ( The solution was actually quite simple by simple UrlDecoding the hash. Since auto-linking tends to strip spaces, quotes and other symbols the only encoded content should be Unicode characters so decoding should work fine. if (hash) {
hash = decodeURIComponent(hash);
var sel = "a[name='" + hash.substr(1) + "']," + hash;
var $el = $(sel);
$("html,body").scrollTop($el.offset().top - 100);
return false;
} |
Glad to hear you've found a solution |
I'm running into issues trying to consolidate links that require encoding and jumping between them in a page. The problem is that it looks like the encoding for links (
[]()
) and generated text and more importantly element IDs are not handled in the same way.No good way to show the ID handling in Babelmark, but the problem shows itself in actual text rendering. Notice the difference in the text/url encoding for the has above vs. the code:
https://babelmark.github.io/?text=*+%5BNamenskonventionen+f%C3%BCr+Forms%5D(%23namenskonventionen-f%C3%BCr-f%22%2C%26orms)%0A%0A%23%23%23+Namenskonventionen+f%C3%BCr+F%22%2C%26orms
If you render with Auto-Ids the IDs use the same encoding as the text on the bottom which doesn't match the link encoding used above.
For a use case of this if I generate a link in a page and want to link it automatically to a header below, there's no single way that I can encode that link. The very specific scenario is a TOC generator where I pick out all the topic headers and then generate a toc of links that point to those same headers. But because the encoding is different the links don't work.
turns into:
The differences in encoding cause the link to not navigate.
There are a number of differences in how things are encoded, but in the above the umlaut probably shouldn't be encoded .
So the question is - should there be a consistent way to encode links that matches what the id generators are using?
Edge case for sure, but this has bitten me for a number of things related to creating reliable intra-document cross links. As it is I have to take over link navigation manually in my document solutions, but I'm not sure how to deal with the above.
The text was updated successfully, but these errors were encountered: