-
-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescaped left angle brackets in link destination #193
Comments
Thanks, bug confirmed.
|
The fix was wrong. Spec says there are two types of link destination:
The point of this bug report was that |
I reverted the completely mistaken fix. Now we have a failing test. What we need is a fix for |
Now I think I understand why I originally disallowed unescaped nested parens! This made scanning much easier. |
We could always have two scanners: first try with a type 1 (if |
That seems quite acceptable! Here's a pretty gross diff that does just that: diff --git a/src/inlines.c b/src/inlines.c
index e30c2af..0500177 100644
--- a/src/inlines.c
+++ b/src/inlines.c
@@ -859,10 +859,14 @@ static bufsize_t manual_scan_link_url(cmark_chunk *input, bufsize_t offset) {
i += 2;
else if (cmark_isspace(input->data[i]))
return -1;
- else
+ else if (input->data[i] == '<') {
+ i = offset;
+ goto type2;
+ } else
++i;
}
} else {
+type2:
while (i < input->len) {
if (input->data[i] == '\\' &&
i + 1 < input-> len && Is this the kind of thing you mean? |
+++ Yuki Izumi [Jul 14 17 04:53 ]:
Is this the kind of thing you mean?
Yes, but I wouldn't want to use goto in this way.
I think it's cleaner to have two scanners, one for
<..> form, the other for others. The main scanner
could apply the first scanner, and if it didn't match
anything, then apply the second scanner.
|
#219 is a PR for a better fix. |
Reopening because we don't correctly handle this case:
which should be a link with URL "%3caaa", but is not parsed as a link at all. |
Note the PR #219 also does not fix the following case:
The problem is that if the parsing of the whole link (or ref. def.) with link destination type 1 fails, parsing of whole link (or ref. def.) with the type 2 should be retried. Doing the retry only on the destination itself is not enough.
Although it is fixable, IMHO the question is whether it is worth it. Take this as my vote to ban the retrying completely in the specs. To be more precise, I vote for something as follows (bold text is new):
This would allow the parser to decide how to parse by checking the first character of the maybe-link-destination string. Rationale:
Note the ambiguity implied in the point 2 is unfixable without change in specification because, with its current wording, |
+1 to banning retrying. I can't think of many useful URIs that begin with |
This issue should be likely closed, as a result of commonmark/commonmark-spec@c1e0183. That actually bans the retrying as asked for above. |
Yes, we can close this now. |
The CommonMark spec has this to say about link destinations:
Given the following input:
this implementation outputs:
However, the Javascript reference implementation outputs:
Note the
%3C
characters surroundingte%3Cst
are missing in the output from this implementation. They should be present, as this should be treated as the non-<...>
type of link definition.This also seems to affect inline link destinations.
The text was updated successfully, but these errors were encountered: