-
Hey everyone, I'm super excited that micromark is now being used for remark-parse! I noticed that nowhere does it advertise that speed/performance being the focus of the project, so I'm kinda curious if anyone had a chance to play around with the new version and compare it to the older remark-parse. I'd love to do this myself too a bit later once I have the time. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi! While performance is important, it is not the sole or most important, focus for this project. The most important is syntax trees (no other parser does that so well as remark), specifically concrete ones (every character in the source code is accounted for; no other language that I know of in this area has this—JS, CSS, HTML, MD, it doesn’t exist). Where remark has done a lot with its syntax trees in the last 6 years, I think the CST will allow for a similar advancement in formatting and checking markdown. Second is CM and GFM compliancy. While many parsers work towards CM, they often mess up to varying degrees on edge cases. For CM specifically, which has ±600 tests, I compared 1000 more edge cases to how the reference CM parsers works, which micromark matches. For GFM, other parsers often have somewhat support for it, but fail a bunch of basic cases. I have spent a lot of time reverse engineering to figure out exactly how github.com works (which is in fact different from their docs or their reference parser), and we match it. Third: bundle size is important; while there are smaller parsers, and no parser is as compliant as micromark or has its concrete tokens feature, it’s pretty radical that micromark packs all this at such a low size. Fourth, I think, is performance: micromark is currently slower than what used to be in remark-parse. About 50% slower. That is significant, but a) micromark is also better than what used to be in remark-parse, b) it’s still fast to parse a big book, c)the problem with syntax trees is always the syntax trees, not the parsing—if remark is slow, that’s because a plugin is slow, not the parsing, and d) there is some low hanging fruit (see open issues). I’ve been reaching out to a couple of folks about improving performance recently (privately and see this tweet), to improve it. I think it can be as fast as remark-parse and maybe a bit better, but in the end I don’t see it ever being as fast as say markdown-it, because the priorities are different. |
Beta Was this translation helpful? Give feedback.
Hi!
While performance is important, it is not the sole or most important, focus for this project.
The most important is syntax trees (no other parser does that so well as remark), specifically concrete ones (every character in the source code is accounted for; no other language that I know of in this area has this—JS, CSS, HTML, MD, it doesn’t exist). Where remark has done a lot with its syntax trees in the last 6 years, I think the CST will allow for a similar advancement in formatting and checking markdown.
Second is CM and GFM compliancy. While many parsers work towards CM, they often mess up to varying degrees on edge cases. For CM specifically, which has ±600 tests, I compared 1000 m…