-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Tokenizer Array using Stringified functions #1914
[WIP] Tokenizer Array using Stringified functions #1914
Conversation
Much faster
Some variables aren't used until the params object. No need to separately declare them before.
Also remove i, l, token from params. Generating/accessing these from the params object is slower than just declaring them within the tokenizer.
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/markedjs/markedjs/ed4mgrjzx |
An unfortunate issue with this sort of handler is with browser CSP |
FYI: no-context/moo#141 Moo (regex-compiling lexer) runs into an almost identical issue. |
Ya I'm going to say I would rather sacrifice speed than security. I'm thinking there are better ways to speed things up like #1872 (comment) |
Ok. Good feedback guys. I'll close this out since it doesn't look like we can do this securely. |
Description
At risk of flooding the repo with garbage PRs:
Experimental modification of #1872. This one uses a static Lexer (via #1909) which enables some hacky shenanigans due to the reduced setup overhead when.
The array of Tokenizer functions is serialized via
function.stringify()
and then combined into a single large function that runs just as fast as the original inlined code. Speed issues seem to be totally solved, testing on my own machine.BUT, I understand this is a weird way to do things and potentially vulnerable to who knows what. The format for custom functions would have to be carefully designed to prevent conflicts in the stringifying/inlining step, e.g. if they define a variable that already exists in one of the other functions, combining them will lead to "variable already exists" errors.
On the other hand, if we could figure out a robust way of doing this, we could further inline other functions that we know execute in a predefined sequence (i.e. the functions in Tokenizer.js) for further speed gains. The entire
blockTokens()
step could be "recompiled" into a single large monolithic function rather than the hundreds of function calls every second.I think this has potential, but needs a good sanity check from someone else.