[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530

AlessioGr · 2024-08-19T23:45:12Z

Description

Motivation for this change

We would like to add mdx support to Payload, allowing mdx files to be easily import to and exported from lexical editors. Multiline markdown transformer support will allow JSX within mdx to be handled properly.

Multiline markdown transformers

Currently, lexical only has one multiline markdown transformer for code nodes that is hardcoded. This comes with 2 issues:

users cannot extend lexical with their own multiline markdown transformers
The code node markdown import transformer is executed (if there is code markdown (```) in the markdown string) even if code nodes are not enabled in the project, leading to errors

This PR adds a new MultilineElementTransformer type and changes the hardcoded code markdown transformer to be a MultilineElementTransformer.

I intentionally went for a separate type, contrary to what @fantactuka suggested here, but I'm happy to explore merging the types if it makes sense. Here are my reasons for making it a separate type:

Within createMarkdownImport it's easier to determine if it's a multiline transformer - the algorithm for handling multiline transformers is very different
Less bloated types. The replace function of the MultilineElementTransformer type drastically differs from the replace function of the ElementTransformer type:
- A multiline transformer does not need isImport as an argument in the replace function. It will only be run during imports, and not in the MarkdownShortcut plugin
- There are 2 kinds of regex matches now (openMatch and closeMatch) and not just match - both should be available as arguments in replace
- I added a new linesInBetween args to the replace arguments, instead of the children argument. Additionally, only the rootNode is passed as an argument. This is easier to understand / simpler and gives the developer full control over which nodes to add to the root node and how, instead of lexical always creating a text node and a paragraph node. The new multiline code node is a good example for this - it does not need a ParagraphNode to be created
- linesInBetween is the text matched between the open and close match, separated by lines. Basically, anything in between the matches. This is easy to understand, the developer now has easy access to anything in between the regex match, as well as the regex matches themselves

Regarding mdx (JSX) support: Now, with multiline markdown transformer support, we can easily utilize a general regex for matching any JSX tags, and then further parse that matched JSX node into the respective lexical node. An example for this can be found in the new tests here:

lexical/packages/lexical-markdown/src/__tests__/unit/LexicalMarkdown.test.ts

Line 271 in 7d4a904

customTransformers: [MDX_HTML_TRANSFORMER],

This also brings us to the second, smaller feature:

Support terminating Element and MultilineElement transformers early, if the regex matched

If, during an import, the regex for one of your transformers matched, this will be the only transformer that will run. You cannot just decide that "hey, the regex matched, but I don't actually want to handle transforming this" inside the replace function.

This is possible in the export function of a transformer by returning null - that way, the next transformer will be called. This PR makes this possible for imports as well. Now, if you return false in the replace function, it will keep attempting to find a match, going to the next markdown transformer.

Use-case: We'd like to employ one, generic multiline markdown transformer for any JSX elements (mdx support). Then, I want to decide within the replace function if I'd like to handle this match or not, by reading the node type (the text between < >) - this is easier than writing unique regex for every single kind of html node I'd like to handle. That way, we'd just have one clean switch statement, and a great DX for allowing our users to register their own markdown handling for certain JSX elements

If there is no handler registered for this node type, I can now simply return false, thus signaling lexical that I don't want to handle this node and that lexical can keep looking for a more suitable transformer.

JS Docs

This adds some JSDocs to the transformer types.

Closes #2564

Test plan

See diffs of the LexicalMarkdown.test.ts file: https://github.com/facebook/lexical/pull/6530/files#diff-9a0c91bc82f2693a031ebe1165316e9726980b7f1d54de8681de1256e1d18303L233

Additionally, the tests for code nodes are still passing - this means the move from hardcoded code multiline transformer to actual code multiline transformer was successful

While working on this, I have also encountered one test case that failed - this fails in the current lexical version as well. As this is not within the scope of this PR, I did not fix it here - however I did include a TODO item, as well as a commented out test case

…se in replace function, allow multi-line transformer to match single lines

…arsing mdx files)

vercel · 2024-08-19T23:45:16Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
lexical	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 26, 2024 3:21am
lexical-playground	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 26, 2024 3:21am

github-actions · 2024-08-19T23:46:52Z

size-limit report 📦

Path	Size
lexical - cjs	29.38 KB (0%)
lexical - esm	29.24 KB (0%)
@lexical/rich-text - cjs	37.89 KB (0%)
@lexical/rich-text - esm	31.07 KB (0%)
@lexical/plain-text - cjs	36.49 KB (0%)
@lexical/plain-text - esm	28.41 KB (0%)
@lexical/react - cjs	39.61 KB (0%)
@lexical/react - esm	32.52 KB (0%)

…same line as the openMatch

etrepum · 2024-08-21T02:30:22Z

I haven't had a chance to look at any of this PR yet, but have you considered starting from a better parsing foundation and building a new lexical markdown API without any legacy constraints (e.g. leveraging remark)? The current one is a bit of a dead end with a handful of design flaws (although a lot less so with all of your work, under the assumption that the code looks good)

AlessioGr · 2024-08-21T04:20:08Z

Hey @etrepum, I did consider something like remark or micromark, but those solutions would increase the bundle size by around 100kB. I'm trying to keep it as small possible. Additionally, something like micromark uses packages like Acorn for handling jsx. We've seen that package unleash hell when it's bundled into a Next.js app in the past, so I'd rather avoid that. This regex-based solution is very lean!

I also wouldn't know how well they would play together using the MarkdownShortcut plugin, which works quite well with regex. I've seen this project use micromark for imports / exports, but it's still using the current solution regex-based solution for the MarkdownShortcut plugin.

I must say though, the regex-based multiline markdown handling from this PR works really well for us so far. I was half expecting us to run into some weird edge-cases that we wouldn't have if we used remark / parsed around some ast, but that's not the case. We're currently using a packaged version of @lexical/markdown with this PR included, and got a bi-directional mdx import / export working - with complex JSX and all that. It's pretty nice - no more editing our docs in plain mdx :p

If the JSX gets too complex to handle using regex (e.g. if you wanted to define sub-lexical nodes using JSX nodes dynamically in their children), there's always the option to just parse the children of the JSX element using micromark & work with the AST tree of that complex JSX element instead. The way we use multiline markdown transformers for JSX, JSX children are exposed as plain strings, so it would be easy to slap it on top of that. But that's optional

^ and this could be done for any complex node / markdown element. This solution allows you to write a very broad regex to match the entire thing, and then opt in something like remark to process it further

…rtcut transforms

…ctly wrapped in paragraphs

AlessioGr · 2024-08-26T03:38:59Z

All tests are now passing! 🎉

The discrepancies I had between local and CI e2e runs was due to CI running the test in plain-text mode. I was running them in rich-text mode locally - didn't notice the environment variable that was being set in CI.

The tests were basically failing because the new multiline code transformer, which is now used instead of the normal code transformer, does not handle "incomplete" markdown (e.g. "```markdown hello" - no closing ```. In order to support incomplete markdown, I have added an optional property to regExpEnd, which can now be an object. If set to true, transformer will run even if no end regex has been matched. I have added new unit tests to test this.

This change now alleviates the need to have both a multiline and a normal ElementTransformer - the MultilineTransformer can handle it all. Thus, I have merged the CODE and CODE_MULTILINE transformers into just one CODE MultilineTransformer. MultilineTransformers can now also run during markdown shortcut transforms. I have made this change in 03ffd3d

Please review the updated API. In the future, we could consider getting rid of the normal ElementTransformer, as the MultilineElementTransformer can handle it all, or merge them together in some other form - though I do not want to make any breaking changes in this PR.

Additionally, I'm considering making the arguments of the MultilineElementTransformer a single object argument, with the current arguments being object props. So,

replace: ({ rootNode, children, startMatch, endMatch, linesInBetween })

Instead of

replace: (rootNode, children, startMatch, endMatch, linesInBetween)

In my opinion, this would be a cleaner API for the user. Better auto-suggestions, and we don't have to force them to type every single argument to get to something like linesInBetween. If we want to add arguments in the future, this will be especially bad. Additionally, we can improve the types by using union types to type children as null if linesInBetween is set, and linesInBetween as null if children is set - or even better, type it based on the isImport property.
However, the disadvantage is that this would be inconsistent with the replace function of the ElementTransformer. Let me know if we should make this change

potatowagon · 2024-08-29T09:50:42Z

do u foresee any breaking change? code looks ok, i havnt spoted any major api changes, more like adding a multiline transformer as an option.

labelling it as breaking change for an internal memo to test further for any markdown regressions

AlessioGr · 2024-08-30T17:58:54Z

do u foresee any breaking change?

Not in this PR; though I foresee us merging MultilineElementTransformers and ElementTransformers together in the future, and that would be a breaking change

potatowagon · 2024-09-10T12:09:14Z

packages/lexical-markdown/src/MarkdownTransformers.ts

+      })(rootNode, children, startMatch, isImport);
+    }
+  },
+  type: 'multilineElement',


should this be multiline-element to follow the conventions of eg. text-format? any side effects if i raise a PR to rename this? @AlessioGr

good catch, yep let's rename it for consistency

AlessioGr added 6 commits August 19, 2024 15:51

feat: multiline markdown transformers

ec0595c

chore: fix @lexical/markdown package referencing itself

b277b15

feat: add ability to allow future transformer to run by returning fal…

8e66055

…se in replace function, allow multi-line transformer to match single lines

chore: add test for multiline markdown html transformer (useful for p…

24cc82a

…arsing mdx files)

chore: fix tests, add new tests

7d4a904

chore: export CODE_MULTILINE and MultilineElementTransformer

3cace56

AlessioGr requested review from zurfyx, fantactuka, acywatson, Fetz, ivailop7, Sahejkm and potatowagon as code owners August 19, 2024 23:45

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 19, 2024

vercel bot deployed to Preview – lexical August 19, 2024 23:46 View deployment

vercel bot deployed to Preview – lexical-playground August 19, 2024 23:46 View deployment

fix babel warnings

ebcc28c

vercel bot deployed to Preview – lexical August 20, 2024 15:07 View deployment

vercel bot deployed to Preview – lexical-playground August 20, 2024 15:07 View deployment

fix: incorrect handling of multiline imports if closeMatch is on the …

0fef215

…same line as the openMatch

vercel bot deployed to Preview – lexical August 21, 2024 02:26 View deployment

vercel bot deployed to Preview – lexical-playground August 21, 2024 02:26 View deployment

Merge branch 'main' into feat/multiline-markdown

bb7c9c4

vercel bot deployed to Preview – lexical August 21, 2024 13:45 View deployment

vercel bot deployed to Preview – lexical-playground August 21, 2024 13:46 View deployment

etrepum added the extended-tests Run extended e2e tests on a PR label Aug 21, 2024

small optimization

28c9def

vercel bot deployed to Preview – lexical August 26, 2024 00:52 View deployment

vercel bot deployed to Preview – lexical-playground August 26, 2024 00:53 View deployment

feat: allow multiline markdown transformers to be run as markdown sho…

03ffd3d

…rtcut transforms

vercel bot had a problem deploying to Preview – lexical August 26, 2024 02:49 Failure

vercel bot deployed to Preview – lexical-playground August 26, 2024 02:49 View deployment

make typescript happy

cadf44a

vercel bot had a problem deploying to Preview – lexical August 26, 2024 02:52 Failure

vercel bot deployed to Preview – lexical-playground August 26, 2024 02:53 View deployment

fix more typescript errors, add more jsdocs

15fe9ec

vercel bot deployed to Preview – lexical August 26, 2024 02:58 View deployment

vercel bot deployed to Preview – lexical-playground August 26, 2024 02:58 View deployment

fix code nodes created through markdown shortcut plugin being incorre…

7db3f31

…ctly wrapped in paragraphs

vercel bot had a problem deploying to Preview – lexical August 26, 2024 03:18 Failure

vercel bot deployed to Preview – lexical-playground August 26, 2024 03:19 View deployment

fix typescript errors

1cf1361

vercel bot deployed to Preview – lexical August 26, 2024 03:21 View deployment

vercel bot deployed to Preview – lexical-playground August 26, 2024 03:21 View deployment

AlessioGr requested review from ivailop7 and etrepum August 26, 2024 03:41

potatowagon approved these changes Aug 29, 2024

View reviewed changes

potatowagon changed the title ~~[lexical-markdown] Feature: multiline markdown transformers / mdx support~~ [lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support Aug 29, 2024

potatowagon added this pull request to the merge queue Sep 6, 2024

Merged via the queue into facebook:main with commit e6efbcc Sep 6, 2024
41 checks passed

potatowagon reviewed Sep 10, 2024

View reviewed changes

potatowagon mentioned this pull request Sep 10, 2024

[lexical-markdown] update markdown flow api #6615

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530

[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530

AlessioGr commented Aug 19, 2024 •

edited

Loading

vercel bot commented Aug 19, 2024 •

edited

Loading

github-actions bot commented Aug 19, 2024 •

edited

Loading

etrepum commented Aug 21, 2024

AlessioGr commented Aug 21, 2024 •

edited

Loading

AlessioGr commented Aug 26, 2024 •

edited

Loading

potatowagon commented Aug 29, 2024

AlessioGr commented Aug 30, 2024

potatowagon Sep 10, 2024

AlessioGr Sep 11, 2024

[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530

[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530

Conversation

AlessioGr commented Aug 19, 2024 • edited Loading

Description

Motivation for this change

Multiline markdown transformers

Support terminating Element and MultilineElement transformers early, if the regex matched

JS Docs

Test plan

vercel bot commented Aug 19, 2024 • edited Loading

github-actions bot commented Aug 19, 2024 • edited Loading

size-limit report 📦

etrepum commented Aug 21, 2024

AlessioGr commented Aug 21, 2024 • edited Loading

AlessioGr commented Aug 26, 2024 • edited Loading

potatowagon commented Aug 29, 2024

AlessioGr commented Aug 30, 2024

potatowagon Sep 10, 2024

Choose a reason for hiding this comment

AlessioGr Sep 11, 2024

Choose a reason for hiding this comment

AlessioGr commented Aug 19, 2024 •

edited

Loading

vercel bot commented Aug 19, 2024 •

edited

Loading

github-actions bot commented Aug 19, 2024 •

edited

Loading

AlessioGr commented Aug 21, 2024 •

edited

Loading

AlessioGr commented Aug 26, 2024 •

edited

Loading