Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Tokenizer Array using Stringified functions #1914

Closed

Conversation

calculuschild
Copy link
Contributor

Description

At risk of flooding the repo with garbage PRs:

Experimental modification of #1872. This one uses a static Lexer (via #1909) which enables some hacky shenanigans due to the reduced setup overhead when.

The array of Tokenizer functions is serialized via function.stringify() and then combined into a single large function that runs just as fast as the original inlined code. Speed issues seem to be totally solved, testing on my own machine.

BUT, I understand this is a weird way to do things and potentially vulnerable to who knows what. The format for custom functions would have to be carefully designed to prevent conflicts in the stringifying/inlining step, e.g. if they define a variable that already exists in one of the other functions, combining them will lead to "variable already exists" errors.

On the other hand, if we could figure out a robust way of doing this, we could further inline other functions that we know execute in a predefined sequence (i.e. the functions in Tokenizer.js) for further speed gains. The entire blockTokens() step could be "recompiled" into a single large monolithic function rather than the hundreds of function calls every second.

I think this has potential, but needs a good sanity check from someone else.

@vercel
Copy link

vercel bot commented Jan 27, 2021

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/ed4mgrjzx
✅ Preview: https://markedjs-git-fork-calculuschild-stringifiedfunctions.markedjs.vercel.app

@Monkatraz
Copy link

An unfortunate issue with this sort of handler is with browser CSP script-src. Using Marked in the browser with a restrictive CSP (which is good for security when you have a lot of UGC, which Marked would be very relevant in) means it will fail to actually create the compiled function, as eval and Function are disabled unless you pass unsafe-eval or load Marked in a trusted manner with strict-dynamic.

@Monkatraz
Copy link

FYI: no-context/moo#141 Moo (regex-compiling lexer) runs into an almost identical issue.

@UziTech
Copy link
Member

UziTech commented Jan 28, 2021

and potentially vulnerable to who knows what.

Ya I'm going to say I would rather sacrifice speed than security. I'm thinking there are better ways to speed things up like #1872 (comment)

@calculuschild
Copy link
Contributor Author

Ok. Good feedback guys. I'll close this out since it doesn't look like we can do this securely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants