-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage on large inputs #56
Comments
Yeah, memory use can be a problem for packrat parsers, since they memoize a lot in return for linear runtime.
Best fix would probably be to replace it with a CommonMark parser, since that is specified so it can be parsed by lines. As mentioned in #53 though, that would require rewriting things and probably would only be partially compatible with existing 3bmd code at best. You can see what it is doing with (esrap:trace-rule '3bmd-grammar::doc :recursive t) before running a parse (try with small inputs, one line of the list from your test is about 1500 lines of output). If I remember how packrat works, each of the And now that I look at the trace, I remember that list items (and probably a few other things) do recursive parses, so it is repeating some work but a lot of that consing is actually getting thrown away at the end of the recursive parse. Possibly adding There might be some other similar places where we can reduce the size of the cache by throwing away things that are only there for the grammar and not needed in the output, will have to look into that at some point. Might also be some places where we can reorder or add rules to avoid testing things that could be proven impossible. For example it seems to be doing extra work to determine there isn't a (defrule %block (and (! eof) (* blank-line)
#.(cons 'or %block-rules%))
(:destructure (eof blank block)
(declare (ignore eof blank))
block))
;; define-extension-block needs to be updated to match as well, I think? would avoid that if you want to try it, but need to think/test a bit more to be sure it doesn't have any side effects. Maybe similar for inlines? |
That |
I'm using the per-block implementation in
parse-doc
, but it's still fairly easy to run out of memory with large%block
s with something like this:This example uses a bulleted list because it is probably the worst offender, but a large paragraph behaves similarly.
According to
time
, consing scales linearly with the number ofrepeat
s, which is good. Perhaps 5267 bytes per character is too high, but I suspect that the main problem is that maximum size of the working set also scales linearly.The text was updated successfully, but these errors were encountered: