-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDX performance #1152
Comments
Even if we compile <Heading icon={<Icon />}>
## Heading {something ? <Y /> : <Z />} |
Yeah, I think what we can do is transpile JSX to function calls inside expressions potentially. I should still be a lot faster than handling the whole document (of course this is something we will benchmark to be sure). |
Probably also faster if we only process jsx there, and nothing else, leaving that up to folks. But indeed, wondering on the benchmarks of 100 expressions vs 1 file |
Yeah so if we keep certain artificials limitations (which already apply today) in place then we can distill the imports/exports from the mdast without the need of Babel. That's been the source of some significant perf improvements at startup time (like gatsbyjs/gatsby#25757). The reasoning here is that the For imports the only limitation might be not to allow comments inside an import and only at the end of a line. These are the forms of import:
The Leaning on the fact that imports are constants (and valid input), no further need to dedupe them is required. So to make life easy, the only syntactical restriction, beyond non-standard syntax of course, is to disallow comments inside the import declaration. And maybe disallow the variant where For exports it's a little trickier, mainly because you can export arbitrary expressions and because of defaults in destructuring. However, it turns out that exports are currently limited to a single line. That's great because that makes them easy to slice out. Further more, if you apply the same comment restriction to exports and disallow destructuring defaults, you can "cheat" your way out of not requiring any JS parser and still distill all the exported symbols, as well as finding the default export. You can even support the newer
In all the above cases, except last, you can parse up to the first For JSX serialization you can use a faster parser/printer than Babel. I know Acorn can do it. There's also Sucrase, and a few others. My suggestion to John was to default to anything fast and to expose an option for the user to do it for you instead, since mdx doesn't reaaally care how the jsx gets compiled to JS. Or wouldn't need to, as far as I understand. So a user could give mdx a callback like If I'm not mistaken, this way MDX wouldn't need to run a JS parser at all. One other potential trick is to concat the expressions with a searchable separator (an identifier of sorts or the Oh and a third option is to allow the user to pass through a Babel config / options for the whole build step. That way if Babel is ran inside MDX anyways, it can just as well also do all the other transformations, like polyfill transforms etc, so that the main pipeline doesn't need to process it again. Potentially. But that might be a pretty big pandora's box of complexity to open up. |
FYI: gatsbyjs/gatsby#26265 adds a baseline mdx benchmark to |
Can't the parsed and transformed AST be passed directly to babel, which then can skip the parse step? That way mdx doesn't need to generate the js and the next step doesn't need to parse, just traverse and generate the final output? |
There are methods to parse the AST upfront, and transform the AST itself. But it looks like the PS. I'm happy to help anyway I can, at Expo we would love improved performance ❤️ |
This PR changes the internals of the core `@mdx-js/mdx` package to generate a JavaScript syntax tree instead of a string. This fixes escaping issues such as #1219. It makes `mdx-hast-to-jsx` much more palatable. It also prevents several Babel parses. It paves the way for passing in Babel plugins, which is useful for users, but also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls directly and make MDX’s output directly usable. * `babel-plugin-apply-mdx-type-props`: add `parentType` * `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace * `mdx`: use `hast-util-to-estree` to transform hast to estree * `mdx`: use `estree-to-babel` to transform estree to Babel * `mdx`: generate estree/Babel instead of strings * `mdx`: use `@babel/generator` to serialize Babel AST * `vue`: stop supporting the react transform: (it doesn’t make sense) * `vue`: fix support for props to components Related to GH-741. Related to GH-1152. Closes GH-606. Closes GH-1028. Closes GH-1219.
This PR changes the internals of the core `@mdx-js/mdx` package to generate a JavaScript syntax tree instead of a string. This fixes escaping issues such as #1219. It makes `mdx-hast-to-jsx` much more palatable. It also prevents several Babel parses. It paves the way for passing in Babel plugins, which is useful for users, but also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls directly and make MDX’s output directly usable. * `babel-plugin-apply-mdx-type-props`: add `parentType` * `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace * `mdx`: use `hast-util-to-estree` to transform hast to estree * `mdx`: use `estree-to-babel` to transform estree to Babel * `mdx`: generate estree/Babel instead of strings * `mdx`: use `@babel/generator` to serialize Babel AST * `vue`: stop supporting the react transform: (it doesn’t make sense) * `vue`: fix support for props to components Related to GH-741. Related to GH-1152. Closes GH-606. Closes GH-1028. Closes GH-1219.
This PR changes the internals of the core `@mdx-js/mdx` package to generate a JavaScript syntax tree instead of a string. This fixes escaping issues such as #1219. It makes `mdx-hast-to-jsx` much more palatable. It also prevents several Babel parses. It paves the way for passing in Babel plugins, which is useful for users, but also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls directly and make MDX’s output directly usable. * `babel-plugin-apply-mdx-type-props`: add `parentType` * `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace * `mdx`: use `hast-util-to-estree` to transform hast to estree * `mdx`: use `estree-to-babel` to transform estree to Babel * `mdx`: generate estree/Babel instead of strings * `mdx`: use `@babel/generator` to serialize Babel AST * `vue`: stop supporting the react transform: (it doesn’t make sense) * `vue`: fix support for props to components Related to GH-741. Related to GH-1152. Closes GH-606. Closes GH-1028. Closes GH-1219. Closes GH-1382. Reviewed-by: Christian Murphy <christian.murphy.42@gmail.com>
This PR changes the internals of the core `@mdx-js/mdx` package to generate a JavaScript syntax tree instead of a string. This fixes escaping issues such as #1219. It makes `mdx-hast-to-jsx` much more palatable. It also prevents several Babel parses. It paves the way for passing in Babel plugins, which is useful for users, but also for us to compile to `React.createElement`, `_jsx`, or Vue’s `h` calls directly and make MDX’s output directly usable. * `babel-plugin-apply-mdx-type-props`: add `parentType` * `mdx`: use `rehype-minify-whitespace` to remove superfluous whitespace * `mdx`: use `hast-util-to-estree` to transform hast to estree * `mdx`: use `estree-to-babel` to transform estree to Babel * `mdx`: generate estree/Babel instead of strings * `mdx`: use `@babel/generator` to serialize Babel AST * `vue`: stop supporting the react transform: (it doesn’t make sense) * `vue`: fix support for props to components Related to GH-741. Related to GH-1152. Closes GH-606. Closes GH-1028. Closes GH-1219. Closes GH-1382. Reviewed-by: Christian Murphy <christian.murphy.42@gmail.com>
This removes the last three custom Babel plugins we had and replaces them with estree versions. Furthermore, it removes `@babel/generator`. For the plugins, we were only looking at ESM import/exports, but right now we’re delegating work to `periscopic` to look at which things are defined in the top-level scope. It’s a bit more complex, but this matches better with intentions, fixes some bugs, and prepares for a potential future where other ES constructs are allowed, so all in all should be a nice improvement. For serializing, we’re switching to `astring`, and handling JSX for now internally (could be externalized later). `astring` seems fast and is incredibly small, but is not very popular. We might perhaps see bugs is serialization in the future because of that, but all our tests seem fine, so I’m not too worried about that. Estree remains a somewhat fragmented ecosystem, such as that the tree walkers in `periscopic` and `astring` are different, so we might also consider writing our own serializer in the future. Or, when we implement Babel’s React JSX transform ourselves, could switch to another generator, or at least drop the JSX serialization code here. Because of these changes, we can drop `@babel/core` and `@babel/generator` from `@mdx-js/mdx`, which drops the bundle size of from 349kb to 111kb. That’s 68%. Pretty nice. This should improve downloading and parsing time of bundles significantly. Of course, we currently still have JSX in the output, so folks will have to resort to Babel (or `buble-jsx-only`) in another step. For performance, v2 (micromark) was already an improvement over v1. On 1000 simple files totalling about 1mb of MDX: * v1: 3739ms * v2: 2734ms (26% faster) * v2 (w/o babel): 1392ms (63% faster). Of course, this all really depends on what type of stuff is in your MDX. But it looks pretty sweet! ✨ Related to GH-1046. Related to GH-1152. Related to GH-1338. Closes GH-704. Closes GH-1384.
This removes the last three custom Babel plugins we had and replaces them with estree versions. Furthermore, it removes `@babel/generator`. For the plugins, we were only looking at ESM import/exports, but right now we’re delegating work to `periscopic` to look at which things are defined in the top-level scope. It’s a bit more complex, but this matches better with intentions, fixes some bugs, and prepares for a potential future where other ES constructs are allowed, so all in all should be a nice improvement. For serializing, we’re switching to `astring`, and handling JSX for now internally (could be externalized later). `astring` seems fast and is incredibly small, but is not very popular. We might perhaps see bugs is serialization in the future because of that, but all our tests seem fine, so I’m not too worried about that. Estree remains a somewhat fragmented ecosystem, such as that the tree walkers in `periscopic` and `astring` are different, so we might also consider writing our own serializer in the future. Or, when we implement Babel’s React JSX transform ourselves, could switch to another generator, or at least drop the JSX serialization code here. Because of these changes, we can drop `@babel/core` and `@babel/generator` from `@mdx-js/mdx`, which drops the bundle size of from 349kb to 111kb. That’s 68%. Pretty nice. This should improve downloading and parsing time of bundles significantly. Of course, we currently still have JSX in the output, so folks will have to resort to Babel (or `buble-jsx-only`) in another step. For performance, v2 (micromark) was already an improvement over v1. On 1000 simple files totalling about 1mb of MDX: * v1: 3739ms * v2: 2734ms (26% faster) * v2 (w/o babel): 1392ms (63% faster). Of course, this all really depends on what type of stuff is in your MDX. But it looks pretty sweet! ✨ Related to GH-1046. Related to GH-1152. Related to GH-1338. Closes GH-704. Closes GH-1384.
Previously, we required an extra build step to produce runnable code. This change makes the output of MDX immediately runnable. This drops the final requirement on Babel (Or Bublé). Dropping Babel leads to a size and performance win for the runtime (and for any use case that doesn’t otherwise require Babel, such as running in Node). Of course, if people want to use the latest JavaScript features, they can still use Babel, but it’s not *required*. Finally, if JSX is preferred (for example, Vue treats JSX radically different from other hyperscript interfaces and has its own JSX builders), `keepJsx` can be set to `true`. In short, the size breakdown for the runtime is: * `@mdx-js/runtime@1.6.22` (last stable tag): 356.4kb * `@mdx-js/runtime@2.0.0-next.8` (last next tag): 362.9kb * Previous commit (on an unmaintained Bublé fork): 165kb * This commit: 120kb (26% / 66% / 69% smaller) Core only adds ±1kb to its bundle size, because `estree-util-build-jsx` reuses dependencies that we already use, too. Related to GH-1041. Related to GH-1044. Related to GH-1152.
Previously, we required an extra build step to produce runnable code. This change makes the output of MDX immediately runnable. This drops the final requirement on Babel (Or Bublé). Dropping Babel leads to a size and performance win for the runtime (and for any use case that doesn’t otherwise require Babel, such as running in Node). Of course, if people want to use the latest JavaScript features, they can still use Babel, but it’s not *required*. Finally, if JSX is preferred (for example, Vue treats JSX radically different from other hyperscript interfaces and has its own JSX builders), `keepJsx` can be set to `true`. In short, the size breakdown for the runtime is: * `@mdx-js/runtime@1.6.22` (last stable tag): 356.4kb * `@mdx-js/runtime@2.0.0-next.8` (last next tag): 362.9kb * Previous commit (on an unmaintained Bublé fork): 165kb * This commit: 120kb (26% / 66% / 69% smaller) Core only adds ±1kb to its bundle size, because `estree-util-build-jsx` reuses dependencies that we already use, too. Related to GH-1041. Related to GH-1044. Related to GH-1152.
This PR moves most of the runtime to the compile time. This issue has nothing to do with `@mdx-js/runtime`. It’s about `@mdx-js/mdx` being compile time, and moving most work there, from the “runtimes” `@mdx-js/react`, `@mdx-js/preact`, `@mdx-js/vue`. Most of the runtime is undocumented features that allow amazing things, but those are in my opinion *too magical*, more powerful than needed, complex to reason about, and again: undocumented. These features are added by overwriting an actual renderer (such as react, preact, or vue). Doing so makes it hard to combine MDX with for example Emotion or theme-ui, to opt into a new JSX transform when React introduces one, to support other hyperscripts, or to add features such as members (`<Foo.Bar />`). Removing these runtime features does what MDX says in the readme: “**🔥 Blazingly blazing fast: MDX has no runtime […]**” This does remove the ability to overwrite *anything* at runtime. This brings back the project to what is documented: users can still overwrite markdown things (e.g., blockquotes) to become components and pass components in at runtime without importing them. And it does still allow undocumented parent-child combos (`blockquote.p`). * Remove runtime renderers (`createElement`s hijacking) from `@mdx-js/react`, `@mdx-js/preact`, `@mdx-js/vue` * Add `jsxRuntime` option to switch to the modern automatic JSX runtime * Add `jsxImportSource` option to switch to a modern non-React JSX runtime * Add `pragma` option to define a classic JSX pragma * Add `pragmaFrag` option to define a classic JSX fragment * Add `mdxProviderImportSource` option to load an optional runtime provider * Add tests for automatic React JSX runtime * Add tests for `@mdx-js/mdx` combined with `emotion` * Add support and test members as “tag names” of elements * Add support and test qualified names (namespaces) as “tag names” of elements * Add tests for parent-child combos * Add tests to assert explicit (inline) components precede over provided/given components * Add tests for `mdxFragment: false` (runtime renderers w/o fragment support) * Fix and test double quotes in attribute values This PR removes the runtime renderers and related things such as the `mdxType` and `parentName` props while keeping the `MDXProvider` in tact. This improves runtime performance, because all that runs at runtime is plain vanilla React/preact/vue code. This reduces the surface of the MDX API while being identical to what is documented and hence to user expectations (except perhaps to some power users). This also makes it easier to support other renderers without having to maintain projects like `@mdx-js/react`, `@mdx-js/preact`, `@mdx-js/vue`: anything that can be used as a JSX pragma (including the [automatic runtime](https://reactjs.org/blog/2020/09/22/introducing-the-new-jsx-transform.html)) is now supported. A related benefit is that it’s easier to integrate with [emotion](https://github.com/emotion-js/emotion/blob/master/packages/react/src/jsx.js#L7) (including through `theme-ui`) and similar projects which also overwrite the renderer: as it’s not possible to have two runtimes, they were hard to combine; because with this PR MDX is no longer a renderer, there’s no conflict anymore. This is done by the compile time (`@mdx-js/mdx`) knowing about an (**optional**) runtime for an `MDXProvider` (such as `@mdx-js/react`, `@mdx-js/preact`). Importantly, it’s not required for other hyperscript interfaces to have a provider: `MDXContent` exported from a compiled MDX file *also* accepts components (it already did), and Vue comes with component passing out of the box. In short, the runtime looked like this: ```js function mdx(thing, props, ...children) { const overwrites = getOverwritesSomeWay() return React.createElement(overwrites[props.mdxType] || thing, props, ...children) } ``` And we had a compile time, which added that `mdxType` prop. So: ```mdx <Youtube /> ``` Became: ```js const Youtube = () => throw new Error('Youtube is not loaded!') <Youtube mdxType="Youtube" /> ``` Which in plain JS looks like: ```js const Youtube = () => throw new Error('Youtube is not loaded!') React.createElement(Youtube, {mdxType: 'Youtube'}) ``` Instead, this now compiles to: ```js const {Youtube} = Object.assign({Youtube: () => throw new Error('Youtube is not loaded!')}, getOverwritesSomeWay()) React.createElement(Youtube) ``` The previous example shows what is sometimes called a “shortcode”: a way to inject components as identifiers into the MDX file, which was introduced in [MDX 1](https://mdxjs.com/blog/shortcodes) A different use case for the runtime was overwriting “defaults”. This is documented on the website as the “[Table of components](https://mdxjs.com/table-of-components)”. This MDX: ```mdx Hello, *world*! ``` Became: ```js <p mdxType="p">Hello, <em mdxType="em">world</em>!</p> ``` This now compiles to: ```js const overwrites = Object.assign({p: 'p', em: 'em'}, getOverwritesSomeWay()) <overwrites.p>Hello, <overwrites.em>world</overwrites.em>!</overwrites.p> ``` This MDX: ```mdx export const Video = () => <Vimeo /> <Video /> ``` Used like so: ```jsx <MDXProvider components={{Video: () => <Youtube />}}> <Content /> </MDXProvider> ``` Would result in a `Youtube` component being rendered. It no longer does. I see the previous behavior as a bug and hence this as a fix. A subset of the above point is that: ```mdx export default props => <main {...props} /> x ``` Used like so: ```jsx <MDXProvider components={{wrapper: props => <article {...props} />}}> <Content /> </MDXProvider> ``` Would result in an `article` instead of the explicit `main`. It no longer does. I see the previous behavior as a bug and hence this as a fix. (#821) ```mdx <h2>World</h2> ``` Used like so: ```jsx <MDXProvider components={{h2: () => <SomethingElse />}}> <Content /> </MDXProvider> ``` Would result in a `SomethingElse` for both. This PR **does not** change that. But it could more easily be changed if we want to, because at compile time we know whether something was a tag or not. An undocumented feature of the current MDX runtime renderer is that it’s possible to overwrite anything: ```mdx <span /> ``` Used like so: ```jsx <MDXProvider components={{span: props => <b>{props.children}</b>}}> <Content /> </MDXProvider> ``` Would overwrite to become bold, even though it’s not documented anywhere. This PR changes that: only allowed markdown “tag names” can be changed (`p`, `li`, ...). **This list could be expanded.** Another undocumented feature is that parent–child combos can be overwritten. A `li` in an `ol` can be treated differently from one in an `ul` by passing `'ol.li': () => <SomethingElse />`. This PR no longer lets users “nest” arbitrary parent–child combos except for `ol.li`, `ul.li`, and `blockquote.p`. **This list could be expanded.** It was not possible to use members (`<foo.bar />`, `<Foo.bar.baz />`, <#953>) and supporting it previously would be complex. This PR adds support for them. Previously, `mdxType` and `parentName` attributes were added to all elements. And a `components` prop was accepted on **all** elements to change the provider. These are no longer passed and no longer accepted. Lastly, `components`, `props` were in scope for all JSX tags defined in the “markdown” section (not the import/exports) of each document. This adds identifiers to the scope prefixed with double underscores: `__provideComponents`, `__components`, and `__props`. A single 1mb MDX file, about 20k lines and 135k words (basically 3 books). Heavy on the “markdown”, few tags, no import/exports. 322kb gzipped. * v1: 2895.122856 * 2.0.0-next.8: 3187.4684129999996 * main: 4058.917152000001 * this pr: 4066.642403 * v1: raw: 1.5mb, gzip: 348kb * 2.0.0-next.8: raw: 1.4mb, gzip: 347kb * main: raw: 1.3mb, gzip: 342kb * this pr: raw: 1.8mb, gzip: 353kb * this pr, automatic runtime: raw: 1.7mb, gzip: 355kb * v1: 321.761208 * 2.0.0-next.8: 321.79749599999997 * main: 162.412757 * this pr: 107.28038599999996 * this pr, automatic runtime: 123.73588899999999 This PR is much faster on giant markdown-esque documents during runtime. The win over the current `main` branch is 34%, the win over the last beta and v1 is 66%. For output size, the raw value increases with this PR, which is because the output is now `/*#__PURE__*/React.createElement(__components.span…)` or `/*#__PURE__*/_jsx(__components.span…)`, instead of `mdx("span", {mdxType: "span"…})`. The change is more repetition, as can be seen by the roughly same gzip sizes. That the build time of `main` and this PR is slower than v1 and the last beta does surprise me a lot. I benchmarked earlier with 1000 small simple MDX files, totalling 1mb, [where the results were the inverse](#1399 (comment)). So it looks like we have a problem with giant files. Still, this PR has no effect on build time performance, because the results are the same as currently on `main`. This PR makes MDX faster, adds support for the modern automatic JSX runtime, and makes it easier to combine with Emotion and similar projects. --- Some of what this PR does has been discussed over the years: Related-to: GH-166. Related-to: GH-197. Related-to: GH-466 (very similar). Related-to: GH-714. Related-to: GH-938. Related-to: GH-1327. This PR solves some of the items outlined in these issues: Related-to: GH-1152. Related-to: #1014 (comment). This PR solves: Closes GH-591. Closes GH-638. Closes GH-785. Closes GH-953. Closes GH-1084. Closes GH-1385.
Lots has changed since this discussion. Babel is no longer needed. Much has been improved.
See https://v2.mdxjs.com/blog/v2/ for more info. |
☂️ This umbrella issue is for tracking work related to improving performance to MDX.
I've been working with @pvdz on MDX performance. We've noted a few aspects that add unnecessary work which we should be able to reduce, especially in v2.
Numerous babel parse and transformation steps
Firstly, we have multiple babel parse steps throughout the MDX transpilation pipeline.
Imports and exports
Peter has done some work here in
gatsby-plugin-mdx
that we can potentially adapt gatsbyjs/gatsby#25437 for usage in core.Shortcode generation
We use babel to figure out what imports and exports exist, and then use that to instantiate variables coming from
MDXProvider
withmakeShortcode
. Also related to gatsbyjs/gatsby#25437mdxType
This is used by the runtime (react/preact/vue) to determine which component to render. This is something we can do from the MDXAST in v2 since the JSX structure is represented.
Returning a compiled string that inevitably needs to be transpiled
Secondly, to these parse steps we also return a JSX string. In nearly all cases this JSX string is then transpiled to JS and
mdx
pragma function calls. This was originally an intentional output because we wanted to make MDX more palatable and familiar. However, it might make sense to serialize directly to function calls and JS.This would remove a babel step users need (unless they're using optional syntax or need browser polyfills which is still achievable in user land).
You all are welcome to bring up other areas of the codebase we can make more performant or other ideas as well! In fact, we'd love your thoughts.
The text was updated successfully, but these errors were encountered: