-
-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix source positions for inlines. #298
base: master
Are you sure you want to change the base?
Conversation
…urce position test case.
The performance penalty encountered when running the benchmark 100 times is ~14ms to the mean execution time (and ~10ms to the median). Given that benchinput.md is 110.6 MB containing 1,484,151 lines (including blank lines), this is a minuscule penalty. That being said, I still advocate only using this functionality when the As an additional thought, aside from complexity, the overhead of introducing some other means of tracking the starting columns for each input line #296 (comment) may have a worse penalty than just stripping whitespace—thoughts? System:
This PR
cmark master (a61c4902f07789d40a717daef710da29e7899485)
|
It's a small enough performance difference that I don't think we need to worry about it. There's still the issue whether to make the parser, as well as the renderer, sensitive to CMARK_OPT_SOURCEPOS. Is there any reason, besides performance, to do this? |
Not ostensibly. The only other reason supporting sensitivity is the isolation it affords from any potentially introduced side effects of the functionality. That being said, if there is confidence in the comprehensiveness of the unit tests, then this is no concern. |
Applied patch from commonmark/cmark#298
@jgm @chriszielinski There is high confidence that the patch breaks nothing while improving a lot. Some minor sourcepos issues likely remain, but the current status is just broken. Will explain below together with some more rationale why this patch is critical for some users and good for cmark's reputation. Why the confidence:
I think the histogram speaks for itself and I was actually impressed by the resulting accuracy, especially because most of the texts contain UTF-8 outside US-ASCII. Why to merge this patch: It allows valuable use cases as our GfmEditor. This tool actually serves for migrating from Redcarpet :). Currently part of a Redmine plugin, but we can make it a separate tool for users and the uses cases can go beyond Redcarpet migration. You can check its description here. Why not to wait (my 2 cents):
Hope it won't be turned down because our work was made on the Thank you for your work and let me know if I can help with something. :) |
This PR fixes the source positions for:
Fixing the source positions for inlines inside inconsistently indented blocks is accomplished by maintaining the leading trivia when constructing the block (see
add_line
in blocks.c), and removing it during the inline parsing stage (see changes in inlines.c). The three source position tests added demonstrate the current implementation's shortfalls.These changes are minimally invasive, and thus, slightly degrade performance—due to stripping the leading whitespace during the inline parsing phase. This can be avoided for normal parse/render operations by wrapping the functionality introduced in this PR with the conditional
if (CMARK_OPT_SOURCEPOS & options)
. However, this will lead to the output of invalid source positions when theCMARK_OPT_SOURCEPOS
option is specified for a rendering operation (i.e.cmark_render_xml
) but not the original parsing operation (i.e.cmark_parse_document
).