Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530

Merged
merged 38 commits into from
Sep 6, 2024

Conversation

AlessioGr
Copy link
Contributor

@AlessioGr AlessioGr commented Aug 19, 2024

Description

Motivation for this change

We would like to add mdx support to Payload, allowing mdx files to be easily import to and exported from lexical editors. Multiline markdown transformer support will allow JSX within mdx to be handled properly.

Multiline markdown transformers

Currently, lexical only has one multiline markdown transformer for code nodes that is hardcoded. This comes with 2 issues:

  • users cannot extend lexical with their own multiline markdown transformers
  • The code node markdown import transformer is executed (if there is code markdown (```) in the markdown string) even if code nodes are not enabled in the project, leading to errors

This PR adds a new MultilineElementTransformer type and changes the hardcoded code markdown transformer to be a MultilineElementTransformer.

I intentionally went for a separate type, contrary to what @fantactuka suggested here, but I'm happy to explore merging the types if it makes sense. Here are my reasons for making it a separate type:

  • Within createMarkdownImport it's easier to determine if it's a multiline transformer - the algorithm for handling multiline transformers is very different
  • Less bloated types. The replace function of the MultilineElementTransformer type drastically differs from the replace function of the ElementTransformer type:
    • A multiline transformer does not need isImport as an argument in the replace function. It will only be run during imports, and not in the MarkdownShortcut plugin
    • There are 2 kinds of regex matches now (openMatch and closeMatch) and not just match - both should be available as arguments in replace
    • I added a new linesInBetween args to the replace arguments, instead of the children argument. Additionally, only the rootNode is passed as an argument. This is easier to understand / simpler and gives the developer full control over which nodes to add to the root node and how, instead of lexical always creating a text node and a paragraph node. The new multiline code node is a good example for this - it does not need a ParagraphNode to be created
    • linesInBetween is the text matched between the open and close match, separated by lines. Basically, anything in between the matches. This is easy to understand, the developer now has easy access to anything in between the regex match, as well as the regex matches themselves

Regarding mdx (JSX) support: Now, with multiline markdown transformer support, we can easily utilize a general regex for matching any JSX tags, and then further parse that matched JSX node into the respective lexical node. An example for this can be found in the new tests here:

customTransformers: [MDX_HTML_TRANSFORMER],

This also brings us to the second, smaller feature:

Support terminating Element and MultilineElement transformers early, if the regex matched

If, during an import, the regex for one of your transformers matched, this will be the only transformer that will run. You cannot just decide that "hey, the regex matched, but I don't actually want to handle transforming this" inside the replace function.

This is possible in the export function of a transformer by returning null - that way, the next transformer will be called. This PR makes this possible for imports as well. Now, if you return false in the replace function, it will keep attempting to find a match, going to the next markdown transformer.

Use-case: We'd like to employ one, generic multiline markdown transformer for any JSX elements (mdx support). Then, I want to decide within the replace function if I'd like to handle this match or not, by reading the node type (the text between < >) - this is easier than writing unique regex for every single kind of html node I'd like to handle. That way, we'd just have one clean switch statement, and a great DX for allowing our users to register their own markdown handling for certain JSX elements

If there is no handler registered for this node type, I can now simply return false, thus signaling lexical that I don't want to handle this node and that lexical can keep looking for a more suitable transformer.

JS Docs

This adds some JSDocs to the transformer types.

Closes #2564

Test plan

See diffs of the LexicalMarkdown.test.ts file: https://github.com/facebook/lexical/pull/6530/files#diff-9a0c91bc82f2693a031ebe1165316e9726980b7f1d54de8681de1256e1d18303L233

Additionally, the tests for code nodes are still passing - this means the move from hardcoded code multiline transformer to actual code multiline transformer was successful

While working on this, I have also encountered one test case that failed - this fails in the current lexical version as well. As this is not within the scope of this PR, I did not fix it here - however I did include a TODO item, as well as a commented out test case

Copy link

vercel bot commented Aug 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
lexical ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 26, 2024 3:21am
lexical-playground ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 26, 2024 3:21am

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 19, 2024
Copy link

github-actions bot commented Aug 19, 2024

size-limit report 📦

Path Size
lexical - cjs 29.38 KB (0%)
lexical - esm 29.24 KB (0%)
@lexical/rich-text - cjs 37.89 KB (0%)
@lexical/rich-text - esm 31.07 KB (0%)
@lexical/plain-text - cjs 36.49 KB (0%)
@lexical/plain-text - esm 28.41 KB (0%)
@lexical/react - cjs 39.61 KB (0%)
@lexical/react - esm 32.52 KB (0%)

@etrepum
Copy link
Collaborator

etrepum commented Aug 21, 2024

I haven't had a chance to look at any of this PR yet, but have you considered starting from a better parsing foundation and building a new lexical markdown API without any legacy constraints (e.g. leveraging remark)? The current one is a bit of a dead end with a handful of design flaws (although a lot less so with all of your work, under the assumption that the code looks good)

@AlessioGr
Copy link
Contributor Author

AlessioGr commented Aug 21, 2024

Hey @etrepum, I did consider something like remark or micromark, but those solutions would increase the bundle size by around 100kB. I'm trying to keep it as small possible. Additionally, something like micromark uses packages like Acorn for handling jsx. We've seen that package unleash hell when it's bundled into a Next.js app in the past, so I'd rather avoid that. This regex-based solution is very lean!

I also wouldn't know how well they would play together using the MarkdownShortcut plugin, which works quite well with regex. I've seen this project use micromark for imports / exports, but it's still using the current solution regex-based solution for the MarkdownShortcut plugin.

I must say though, the regex-based multiline markdown handling from this PR works really well for us so far. I was half expecting us to run into some weird edge-cases that we wouldn't have if we used remark / parsed around some ast, but that's not the case. We're currently using a packaged version of @lexical/markdown with this PR included, and got a bi-directional mdx import / export working - with complex JSX and all that. It's pretty nice - no more editing our docs in plain mdx :p

If the JSX gets too complex to handle using regex (e.g. if you wanted to define sub-lexical nodes using JSX nodes dynamically in their children), there's always the option to just parse the children of the JSX element using micromark & work with the AST tree of that complex JSX element instead. The way we use multiline markdown transformers for JSX, JSX children are exposed as plain strings, so it would be easy to slap it on top of that. But that's optional

^ and this could be done for any complex node / markdown element. This solution allows you to write a very broad regex to match the entire thing, and then opt in something like remark to process it further

@AlessioGr
Copy link
Contributor Author

AlessioGr commented Aug 26, 2024

All tests are now passing! 🎉

The discrepancies I had between local and CI e2e runs was due to CI running the test in plain-text mode. I was running them in rich-text mode locally - didn't notice the environment variable that was being set in CI.

The tests were basically failing because the new multiline code transformer, which is now used instead of the normal code transformer, does not handle "incomplete" markdown (e.g. "```markdown hello" - no closing ```. In order to support incomplete markdown, I have added an optional property to regExpEnd, which can now be an object. If set to true, transformer will run even if no end regex has been matched. I have added new unit tests to test this.

This change now alleviates the need to have both a multiline and a normal ElementTransformer - the MultilineTransformer can handle it all. Thus, I have merged the CODE and CODE_MULTILINE transformers into just one CODE MultilineTransformer. MultilineTransformers can now also run during markdown shortcut transforms. I have made this change in 03ffd3d

Please review the updated API. In the future, we could consider getting rid of the normal ElementTransformer, as the MultilineElementTransformer can handle it all, or merge them together in some other form - though I do not want to make any breaking changes in this PR.

Additionally, I'm considering making the arguments of the MultilineElementTransformer a single object argument, with the current arguments being object props. So,

replace: ({ rootNode, children, startMatch, endMatch, linesInBetween })

Instead of

replace: (rootNode, children, startMatch, endMatch, linesInBetween)

In my opinion, this would be a cleaner API for the user. Better auto-suggestions, and we don't have to force them to type every single argument to get to something like linesInBetween. If we want to add arguments in the future, this will be especially bad. Additionally, we can improve the types by using union types to type children as null if linesInBetween is set, and linesInBetween as null if children is set - or even better, type it based on the isImport property.
However, the disadvantage is that this would be inconsistent with the replace function of the ElementTransformer. Let me know if we should make this change

@AlessioGr AlessioGr requested review from ivailop7 and etrepum August 26, 2024 03:41
@potatowagon potatowagon changed the title [lexical-markdown] Feature: multiline markdown transformers / mdx support [lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support Aug 29, 2024
@potatowagon
Copy link
Contributor

do u foresee any breaking change? code looks ok, i havnt spoted any major api changes, more like adding a multiline transformer as an option.

labelling it as breaking change for an internal memo to test further for any markdown regressions

@AlessioGr
Copy link
Contributor Author

do u foresee any breaking change?

Not in this PR; though I foresee us merging MultilineElementTransformers and ElementTransformers together in the future, and that would be a breaking change

@potatowagon potatowagon added this pull request to the merge queue Sep 6, 2024
Merged via the queue into facebook:main with commit e6efbcc Sep 6, 2024
41 checks passed
})(rootNode, children, startMatch, isImport);
}
},
type: 'multilineElement',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be multiline-element to follow the conventions of eg. text-format? any side effects if i raise a PR to rename this? @AlessioGr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, yep let's rename it for consistency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. extended-tests Run extended e2e tests on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Markdown code node importing is hardcoded
5 participants