Performance benchmarks #29

lishid · 2020-10-25T14:18:34Z

lishid
Oct 25, 2020

Hey everyone, I'm super excited that micromark is now being used for remark-parse!

I noticed that nowhere does it advertise that speed/performance being the focus of the project, so I'm kinda curious if anyone had a chance to play around with the new version and compare it to the older remark-parse. I'd love to do this myself too a bit later once I have the time.

Thanks in advance!

Answered by wooorm

Oct 25, 2020

Hi!

While performance is important, it is not the sole or most important, focus for this project.

The most important is syntax trees (no other parser does that so well as remark), specifically concrete ones (every character in the source code is accounted for; no other language that I know of in this area has this—JS, CSS, HTML, MD, it doesn’t exist). Where remark has done a lot with its syntax trees in the last 6 years, I think the CST will allow for a similar advancement in formatting and checking markdown.

Second is CM and GFM compliancy. While many parsers work towards CM, they often mess up to varying degrees on edge cases. For CM specifically, which has ±600 tests, I compared 1000 m…

View full answer

wooorm · 2020-10-25T14:45:04Z

wooorm
Oct 25, 2020
Maintainer

Hi!

While performance is important, it is not the sole or most important, focus for this project.

The most important is syntax trees (no other parser does that so well as remark), specifically concrete ones (every character in the source code is accounted for; no other language that I know of in this area has this—JS, CSS, HTML, MD, it doesn’t exist). Where remark has done a lot with its syntax trees in the last 6 years, I think the CST will allow for a similar advancement in formatting and checking markdown.

Second is CM and GFM compliancy. While many parsers work towards CM, they often mess up to varying degrees on edge cases. For CM specifically, which has ±600 tests, I compared 1000 more edge cases to how the reference CM parsers works, which micromark matches. For GFM, other parsers often have somewhat support for it, but fail a bunch of basic cases. I have spent a lot of time reverse engineering to figure out exactly how github.com works (which is in fact different from their docs or their reference parser), and we match it.

Third: bundle size is important; while there are smaller parsers, and no parser is as compliant as micromark or has its concrete tokens feature, it’s pretty radical that micromark packs all this at such a low size.

Fourth, I think, is performance: micromark is currently slower than what used to be in remark-parse. About 50% slower. That is significant, but a) micromark is also better than what used to be in remark-parse, b) it’s still fast to parse a big book, c)the problem with syntax trees is always the syntax trees, not the parsing—if remark is slow, that’s because a plugin is slow, not the parsing, and d) there is some low hanging fruit (see open issues).

I’ve been reaching out to a couple of folks about improving performance recently (privately and see this tweet), to improve it. I think it can be as fast as remark-parse and maybe a bit better, but in the end I don’t see it ever being as fast as say markdown-it, because the priorities are different.

2 replies

lishid Oct 25, 2020
Author

Great answer, thank you so much! Totally understand this isn't the focus and syntax tree is exactly why I'm using remark over other implementations.

(which is in fact different from their docs or their reference parser)

I hate it when that happens 😆

micromark is currently slower than what used to be in remark-parse. About 50% slower.

This is the exact information I was looking for :)

I’ve been reaching out to a couple of folks about improving performance recently

I have some experience with that and I'd love to help out later once I get some free time on my hands.
I'd also ping @fabiospampinato since he's quite experienced in the field, and it looks like you guys already know each other 😝

wooorm Oct 25, 2020
Maintainer

Yes, we’ve talked about this a couple weeks ago!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

micromark

Performance benchmarks #29

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

micromark

Performance benchmarks #29

lishid Oct 25, 2020

Replies: 1 comment · 2 replies

wooorm Oct 25, 2020 Maintainer

lishid Oct 25, 2020 Author

wooorm Oct 25, 2020 Maintainer

lishid
Oct 25, 2020

Replies: 1 comment 2 replies

wooorm
Oct 25, 2020
Maintainer

lishid Oct 25, 2020
Author

wooorm Oct 25, 2020
Maintainer