On Reducing Inline Quotes False-Positives #36
Unanswered
tajmone
asked this question in
Syntax Roadmap
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Just dumping here some considerations that crossed my mind...
I've noticed that even though I've put quite some energy in improving quotes inline elements with paired-delimiters (e.g. bold, italic, and similar), I still get the occasional document break-up due to literal occurrences of these symbols being false-positively matched as an opening delimiter of a quote formatting element.
I remember that when the original syntax mismatched a (well formed) literal
*
as an opening bold delimiter, the whole document would break-up due out-of-synch delimiters and, often, the parsing stack being trapped in the bold context.Adding a rule which would force-pop of a quote element when reaching the EOL was a huge improvements in such elements, for at least those false positives would only disrupt a single line.
There was also the assumption that idiomatic AsciiDoc demands splitting text on-line-per-sentence (and not, like we often see in Markdown, wrapping to 80, semantic wrapping, etc.). I believe that this has proven to be a good choice — lacking a full parser, how could we safely handle an opening inline delimiter that starts mid-sentence, if its closing counterpart is found in another line? It would open the doors to the original problem, and be a non-idiomatic case too.
So, thinking along the same lines, I was wondering whether we could further improve those inline elements ("quotes", as they were once called in the docs) by using a lookhead to ensure there's a closing counterpart, before actually entering the matching context.
E.g., the bold element is currently defined as:
The above could be roughly changed to something like:
i.e. only start capturing the opening delimiter (and in this case, the attributes list) if a valid opening delimiter and its closing counterpart are both present in the current line being parsed. We'd be basically renaming the original
strong
context tostrong_begin
and replacingstrong
with a lookahead trigger.Of course, the above RegEx could be better smoothed out, without all those perky checks (attributes lists should be defined elsewhere, and just
include
d were needed), but the whole point here is to demonstrate that a non-matching lookahead could be used as a trigger for the actual context, thus avoiding capturing anything unless a pair of delimiters is known to be present.Surely, the presumed closing delimiter might actually be a literal character instead (escaped), or some other construct (depending on the quote at hand), but this should still be a safeguard able to reduce false-positives and document breakage.
Worth trying it out and experiment it via thorough tests for known context which are currently problematic.
Beta Was this translation helpful? Give feedback.
All reactions