-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lexical-markdown][breaking change] Feature: multiline markdown transformers / mdx support #6530
Conversation
…se in replace function, allow multi-line transformer to match single lines
…arsing mdx files)
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
size-limit report 📦
|
…same line as the openMatch
I haven't had a chance to look at any of this PR yet, but have you considered starting from a better parsing foundation and building a new lexical markdown API without any legacy constraints (e.g. leveraging remark)? The current one is a bit of a dead end with a handful of design flaws (although a lot less so with all of your work, under the assumption that the code looks good) |
Hey @etrepum, I did consider something like remark or micromark, but those solutions would increase the bundle size by around 100kB. I'm trying to keep it as small possible. Additionally, something like micromark uses packages like I also wouldn't know how well they would play together using the MarkdownShortcut plugin, which works quite well with regex. I've seen this project use micromark for imports / exports, but it's still using the current solution regex-based solution for the MarkdownShortcut plugin. I must say though, the regex-based multiline markdown handling from this PR works really well for us so far. I was half expecting us to run into some weird edge-cases that we wouldn't have if we used remark / parsed around some ast, but that's not the case. We're currently using a packaged version of @lexical/markdown with this PR included, and got a bi-directional mdx import / export working - with complex JSX and all that. It's pretty nice - no more editing our docs in plain mdx :p If the JSX gets too complex to handle using regex (e.g. if you wanted to define sub-lexical nodes using JSX nodes dynamically in their children), there's always the option to just parse the children of the JSX element using micromark & work with the AST tree of that complex JSX element instead. The way we use multiline markdown transformers for JSX, JSX children are exposed as plain strings, so it would be easy to slap it on top of that. But that's optional ^ and this could be done for any complex node / markdown element. This solution allows you to write a very broad regex to match the entire thing, and then opt in something like remark to process it further |
…ctly wrapped in paragraphs
All tests are now passing! 🎉 The discrepancies I had between local and CI e2e runs was due to CI running the test in plain-text mode. I was running them in rich-text mode locally - didn't notice the environment variable that was being set in CI. The tests were basically failing because the new multiline code transformer, which is now used instead of the normal code transformer, does not handle "incomplete" markdown (e.g. "```markdown hello" - no closing ```. In order to support incomplete markdown, I have added an This change now alleviates the need to have both a multiline and a normal ElementTransformer - the MultilineTransformer can handle it all. Thus, I have merged the Please review the updated API. In the future, we could consider getting rid of the normal Additionally, I'm considering making the arguments of the MultilineElementTransformer a single object argument, with the current arguments being object props. So,
Instead of
In my opinion, this would be a cleaner API for the user. Better auto-suggestions, and we don't have to force them to type every single argument to get to something like |
do u foresee any breaking change? code looks ok, i havnt spoted any major api changes, more like adding a multiline transformer as an option. labelling it as breaking change for an internal memo to test further for any markdown regressions |
Not in this PR; though I foresee us merging MultilineElementTransformers and ElementTransformers together in the future, and that would be a breaking change |
})(rootNode, children, startMatch, isImport); | ||
} | ||
}, | ||
type: 'multilineElement', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be multiline-element to follow the conventions of eg. text-format
? any side effects if i raise a PR to rename this? @AlessioGr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, yep let's rename it for consistency
Description
Motivation for this change
We would like to add mdx support to Payload, allowing mdx files to be easily import to and exported from lexical editors. Multiline markdown transformer support will allow JSX within mdx to be handled properly.
Multiline markdown transformers
Currently, lexical only has one multiline markdown transformer for code nodes that is hardcoded. This comes with 2 issues:
This PR adds a new
MultilineElementTransformer
type and changes the hardcoded code markdown transformer to be aMultilineElementTransformer
.I intentionally went for a separate type, contrary to what @fantactuka suggested here, but I'm happy to explore merging the types if it makes sense. Here are my reasons for making it a separate type:
createMarkdownImport
it's easier to determine if it's a multiline transformer - the algorithm for handling multiline transformers is very differentreplace
function of theMultilineElementTransformer
type drastically differs from thereplace
function of theElementTransformer
type:isImport
as an argument in thereplace
function. It will only be run during imports, and not in the MarkdownShortcut pluginopenMatch
andcloseMatch
) and not justmatch
- both should be available as arguments inreplace
linesInBetween
args to thereplace
arguments, instead of thechildren
argument. Additionally, only therootNode
is passed as an argument. This is easier to understand / simpler and gives the developer full control over which nodes to add to the root node and how, instead of lexical always creating a text node and a paragraph node. The new multiline code node is a good example for this - it does not need aParagraphNode
to be createdlinesInBetween
is the text matched between the open and close match, separated by lines. Basically, anything in between the matches. This is easy to understand, the developer now has easy access to anything in between the regex match, as well as the regex matches themselvesRegarding mdx (JSX) support: Now, with multiline markdown transformer support, we can easily utilize a general regex for matching any JSX tags, and then further parse that matched JSX node into the respective lexical node. An example for this can be found in the new tests here:
lexical/packages/lexical-markdown/src/__tests__/unit/LexicalMarkdown.test.ts
Line 271 in 7d4a904
This also brings us to the second, smaller feature:
Support terminating Element and MultilineElement transformers early, if the regex matched
If, during an import, the regex for one of your transformers matched, this will be the only transformer that will run. You cannot just decide that "hey, the regex matched, but I don't actually want to handle transforming this" inside the
replace
function.This is possible in the
export
function of a transformer by returning null - that way, the next transformer will be called. This PR makes this possible for imports as well. Now, if you returnfalse
in thereplace
function, it will keep attempting to find a match, going to the next markdown transformer.Use-case: We'd like to employ one, generic multiline markdown transformer for any JSX elements (mdx support). Then, I want to decide within the replace function if I'd like to handle this match or not, by reading the node type (the text between
< >
) - this is easier than writing unique regex for every single kind of html node I'd like to handle. That way, we'd just have one clean switch statement, and a great DX for allowing our users to register their own markdown handling for certain JSX elementsIf there is no handler registered for this node type, I can now simply return false, thus signaling lexical that I don't want to handle this node and that lexical can keep looking for a more suitable transformer.
JS Docs
This adds some JSDocs to the transformer types.
Closes #2564
Test plan
See diffs of the LexicalMarkdown.test.ts file: https://github.com/facebook/lexical/pull/6530/files#diff-9a0c91bc82f2693a031ebe1165316e9726980b7f1d54de8681de1256e1d18303L233
Additionally, the tests for code nodes are still passing - this means the move from hardcoded code multiline transformer to actual code multiline transformer was successful
While working on this, I have also encountered one test case that failed - this fails in the current lexical version as well. As this is not within the scope of this PR, I did not fix it here - however I did include a TODO item, as well as a commented out test case