-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
turndown not escaping links properly #459
turndown not escaping links properly #459
Comments
Good catch. Btw. according to the spec it seems that parentheses does not have to be escaped as long as they are balanced. So in your example you should be fine without any preprocessing. I don't understand your last comment about "double-escaping URLs". Can you elaborate a bit more? |
Yeah, sure. <a href="https://google.com/file(1).jpg">link</a> But, protected it before calling turndown, due to this issue: <a href="https://google.com/file\(1\).jpg">link</a> See that the parentheses were already escaped and leave them be. FYI - I tried 2 markdown editors, they both automatically escape links that have (1). |
I see. But that would be invalid URL in the HTML input. I don't think this library should take something like that into account. I think we should add "aggressive" parenthesis escaping (i.e. don't bother checking if the parenthesis are balanced, simply escape them every time). Such behavior is in line with https://github.com/mixmark-io/turndown?tab=readme-ov-file#escaping-markdown-characters. Such change might be quite simple - adding test case to [test/index.html](https://github.com/mixmark-io/turndown/blob/master/test/index.html and adding replace to commonmark-rules.js#. |
Escape parenthesis in link URLs. Fix #459.
Yeah, totally agree! |
I finally got around to verify this issue, however it was fixed for links but not for images. this html: <img src="https://google.com/file 1).jpg" /> produces this markdown: ![](https://google.com/file 1).jpg)
|
I noticed this scenario and think it would be useful for others if turndown would handle this better.
Background:
( and ) are valid characters in a URL, and won't get escaped in normal HTML.
In markdown, links are surrounded by ( and ), if your link needs to have ( and ) you will need to escape them with \
For this input html:
A valid markdown should be:
However turndown returns:
I'm now pre-fixing image and links in my html prior to calling the turndown service, but I really think this should be handled in the parser.
When you do, be sure not to double-escape URLs if there were already escaped (meaning, if the URL has "(1).jpg" there is no need to double-escape it.
The text was updated successfully, but these errors were encountered: