-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec #6608
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
size-limit report 📦
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving because it does seem like an incremental improvement but I think there's still other classes of edge cases in here
Thanks for your comments @etrepum, I already solved them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of spec compliance makes it hard to review thoroughly since it's easy to find things that don't work correctly, but it does pass our tests and I think this particular change is unlikely to cause new problems.
thankyou for your review |
**NOTE that this PR is to the `lexical-mdx-shouldMerge` branch.** ___ I had to make some fixes to `normalizeMarkdown`. I also fixed some tests that were not considering the correct functioning of breaking lines in markdown (see facebook/lexical#6608). All tests now pass.
Problem
In markdown, the concept of "empty paragraphs" does not exist.
Blocks must be separated by an empty line, and non-empty adjacent lines must be merged.
Let's take this markdown as an example:
Currently, it is converted to the following (incorrect):
When in fact, the correct output should be this (proof):
For context, we are trying to import real
mdx
files into Lexical, and because of this issue, the current output contains some errors, such as this one here.Solution
At first I thought it would be enough to remove all single line breaks (
\n
), which were not accompanied by further consecutive line breaks (\n\n
).However, the solution was not that easy, as there were some tricky edgecases. To mention just a couple of examples:
That's why I wrote a function called
sanitizeMarkdown
to cover these and all the other cases I found, and which is now run inside$convertFromMarkdownString
.A few tests required changes. Since the correct result was not obvious, I left some comments linking to permalinks in the CommonMark playground.
Future work
If one would like to add a hard line break, there are 3 ways to do so in markdown (source):
<br>
tag,\
.Right now, the only option that works with Lexical is the first one. I've left a TO-DO comment indicating that it would be nice to support the other 2 in the future.
Test plan
Before
Screen.Recording.2024-09-08.at.12.45.19.AM.mov
After
Screen.Recording.2024-09-08.at.12.43.14.AM.mov