[WIP] Tokenizer Array using Stringified functions #1914

calculuschild · 2021-01-27T04:07:06Z

Description

At risk of flooding the repo with garbage PRs:

Experimental modification of #1872. This one uses a static Lexer (via #1909) which enables some hacky shenanigans due to the reduced setup overhead when.

The array of Tokenizer functions is serialized via function.stringify() and then combined into a single large function that runs just as fast as the original inlined code. Speed issues seem to be totally solved, testing on my own machine.

BUT, I understand this is a weird way to do things and potentially vulnerable to who knows what. The format for custom functions would have to be carefully designed to prevent conflicts in the stringifying/inlining step, e.g. if they define a variable that already exists in one of the other functions, combining them will lead to "variable already exists" errors.

On the other hand, if we could figure out a robust way of doing this, we could further inline other functions that we know execute in a predefined sequence (i.e. the functions in Tokenizer.js) for further speed gains. The entire blockTokens() step could be "recompiled" into a single large monolithic function rather than the hundreds of function calls every second.

I think this has potential, but needs a good sanity check from someone else.

Much faster

Some variables aren't used until the params object. No need to separately declare them before.

Also remove i, l, token from params. Generating/accessing these from the params object is slower than just declaring them within the tokenizer.

vercel · 2021-01-27T04:07:10Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/markedjs/markedjs/ed4mgrjzx
✅ Preview: https://markedjs-git-fork-calculuschild-stringifiedfunctions.markedjs.vercel.app

Monkatraz · 2021-01-27T04:36:23Z

An unfortunate issue with this sort of handler is with browser CSP script-src. Using Marked in the browser with a restrictive CSP (which is good for security when you have a lot of UGC, which Marked would be very relevant in) means it will fail to actually create the compiled function, as eval and Function are disabled unless you pass unsafe-eval or load Marked in a trusted manner with strict-dynamic.

Monkatraz · 2021-01-27T04:51:42Z

FYI: no-context/moo#141 Moo (regex-compiling lexer) runs into an almost identical issue.

UziTech · 2021-01-28T17:11:07Z

and potentially vulnerable to who knows what.

Ya I'm going to say I would rather sacrifice speed than security. I'm thinking there are better ways to speed things up like #1872 (comment)

calculuschild · 2021-01-28T19:13:16Z

Ok. Good feedback guys. I'll close this out since it doesn't look like we can do this securely.

calculuschild added 12 commits December 13, 2020 02:07

Move BlockTokenizers to array

4ce28a2

Array to Map, Fixed Links

63973e6

Converted inline Tokenizers over

2595fb8

Change Maps to Arrays of Objects

e79cbaf

Much faster

Lint

a6741f1

Remove redundant variable declarations

196686c

Some variables aren't used until the params object. No need to separately declare them before.

Array loop to Array.some

df2f852

Also remove i, l, token from params. Generating/accessing these from the params object is slower than just declaring them within the tokenizer.

Move src and tokens out of Params

583f18e

Arrow function to normal

91016c7

Clear up leftover comments

db494ce

Pass this as parameter instead of .call()

4707ad8

Inline sub functions via stringify

0c78056

vercel bot deployed to Preview January 27, 2021 04:07 View deployment

calculuschild mentioned this pull request Jan 27, 2021

Rework Lexer to use extendable array of tokenizer functions #1872

Closed

5 tasks

calculuschild requested a review from UziTech January 27, 2021 04:16

calculuschild closed this Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Tokenizer Array using Stringified functions #1914

[WIP] Tokenizer Array using Stringified functions #1914

calculuschild commented Jan 27, 2021

vercel bot commented Jan 27, 2021 •

edited

Loading

Monkatraz commented Jan 27, 2021

Monkatraz commented Jan 27, 2021

UziTech commented Jan 28, 2021

calculuschild commented Jan 28, 2021

[WIP] Tokenizer Array using Stringified functions #1914

[WIP] Tokenizer Array using Stringified functions #1914

Conversation

calculuschild commented Jan 27, 2021

Description

vercel bot commented Jan 27, 2021 • edited Loading

Monkatraz commented Jan 27, 2021

Monkatraz commented Jan 27, 2021

UziTech commented Jan 28, 2021

calculuschild commented Jan 28, 2021

vercel bot commented Jan 27, 2021 •

edited

Loading