[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec #6608

GermanJablo · 2024-09-06T21:27:40Z

Problem

In markdown, the concept of "empty paragraphs" does not exist.
Blocks must be separated by an empty line, and non-empty adjacent lines must be merged.

Let's take this markdown as an example:

one
two

three

Currently, it is converted to the following (incorrect):

<p>one<br>two</p>
<p>three</p>

When in fact, the correct output should be this (proof):

<p>onetwo</p>
<p>three</p>

For context, we are trying to import real mdx files into Lexical, and because of this issue, the current output contains some errors, such as this one here.

Solution

At first I thought it would be enough to remove all single line breaks (\n), which were not accompanied by further consecutive line breaks (\n\n).

However, the solution was not that easy, as there were some tricky edgecases. To mention just a couple of examples:

within a code block, no line breaks should be removed (ref).
consecutive heading + paragraph should not be combined, but list-item + "paragraph" should (ref).

That's why I wrote a function called sanitizeMarkdown to cover these and all the other cases I found, and which is now run inside $convertFromMarkdownString.

A few tests required changes. Since the correct result was not obvious, I left some comments linking to permalinks in the CommonMark playground.

Future work

If one would like to add a hard line break, there are 3 ways to do so in markdown (source):

an html <br> tag,
ending the line with two spaces,
or ending the line with \.

Right now, the only option that works with Lexical is the first one. I've left a TO-DO comment indicating that it would be nice to support the other 2 in the future.

Test plan

Before

Screen.Recording.2024-09-08.at.12.45.19.AM.mov

After

Screen.Recording.2024-09-08.at.12.43.14.AM.mov

vercel · 2024-09-06T21:27:44Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
lexical	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 9, 2024 1:31pm
lexical-playground	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 9, 2024 1:31pm

github-actions · 2024-09-06T21:29:22Z

size-limit report 📦

Path	Size
lexical - cjs	29.61 KB (0%)
lexical - esm	29.49 KB (0%)
@lexical/rich-text - cjs	38.07 KB (0%)
@lexical/rich-text - esm	31.3 KB (0%)
@lexical/plain-text - cjs	36.72 KB (0%)
@lexical/plain-text - esm	28.65 KB (0%)
@lexical/react - cjs	39.89 KB (0%)
@lexical/react - esm	32.79 KB (0%)

etrepum

Approving because it does seem like an incremental improvement but I think there's still other classes of edge cases in here

packages/lexical-markdown/src/MarkdownTransformers.ts

GermanJablo · 2024-09-09T12:37:14Z

Thanks for your comments @etrepum, I already solved them

packages/lexical-markdown/src/MarkdownTransformers.ts

etrepum

The lack of spec compliance makes it hard to review thoroughly since it's easy to find things that don't work correctly, but it does pass our tests and I think this particular change is unlikely to cause new problems.

potatowagon · 2024-09-10T14:20:08Z

The lack of spec compliance makes it hard to review thoroughly since it's easy to find things that don't work correctly, but it does pass our tests and I think this particular change is unlikely to cause new problems.

thankyou for your review

…kdownString to comply with CommonMark spec (#6608)" This reverts commit 8123ca7.

…kdownString to comply with CommonMark spec (#6608)" (#6627)

**NOTE that this PR is to the `lexical-mdx-shouldMerge` branch.** ___ I had to make some fixes to `normalizeMarkdown`. I also fixed some tests that were not considering the correct functioning of breaking lines in markdown (see facebook/lexical#6608). All tests now pass.

code-md

f464bae

GermanJablo requested review from zurfyx, fantactuka, acywatson, Fetz, ivailop7, Sahejkm and potatowagon as code owners September 6, 2024 21:27

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 6, 2024

vercel bot deployed to Preview – lexical September 6, 2024 21:29 View deployment

vercel bot deployed to Preview – lexical-playground September 6, 2024 21:30 View deployment

GermanJablo marked this pull request as draft September 6, 2024 21:36

GermanJablo added 2 commits September 7, 2024 10:30

sanitize markdown

f6830a3

Merge branch 'main' into code-md

3d20030

GermanJablo force-pushed the code-md branch from 8c125b8 to 3d20030 Compare September 7, 2024 13:32

vercel bot deployed to Preview – lexical September 7, 2024 13:33 View deployment

vercel bot deployed to Preview – lexical-playground September 7, 2024 13:33 View deployment

save

cb29c2d

vercel bot deployed to Preview – lexical September 7, 2024 13:44 View deployment

vercel bot deployed to Preview – lexical-playground September 7, 2024 13:44 View deployment

GermanJablo added 2 commits September 7, 2024 23:57

save

9b0d561

Merge branch 'main' into code-md

3753e3e

vercel bot deployed to Preview – lexical September 8, 2024 02:59 View deployment

vercel bot deployed to Preview – lexical-playground September 8, 2024 02:59 View deployment

nit comment

eb16b01

vercel bot deployed to Preview – lexical September 8, 2024 03:01 View deployment

vercel bot deployed to Preview – lexical-playground September 8, 2024 03:01 View deployment

fix import

e9a741b

vercel bot deployed to Preview – lexical September 8, 2024 03:05 View deployment

vercel bot deployed to Preview – lexical-playground September 8, 2024 03:05 View deployment

GermanJablo changed the title ~~[lexical-markdown] Fix: markdown transformer for links separated by newline~~ [lexical-markdown] Fix: sanitize markdown in $convertFromMarkdownString to comply with CommonMark spec Sep 8, 2024

GermanJablo marked this pull request as ready for review September 8, 2024 03:54

GermanJablo mentioned this pull request Sep 8, 2024

richtext-lexical: markdown transformer for links separated by newline not working as expected payloadcms/payload#8049

Closed

etrepum previously approved these changes Sep 8, 2024

View reviewed changes

packages/lexical-markdown/src/MarkdownTransformers.ts Outdated Show resolved Hide resolved

packages/lexical-markdown/src/MarkdownTransformers.ts Outdated Show resolved Hide resolved

GermanJablo added 2 commits September 9, 2024 09:29

rename sanitizeMarkdown to normalizeMarkdown

e15a4d8

fix regex for code block

b9b6ee3

GermanJablo dismissed etrepum’s stale review via b9b6ee3 September 9, 2024 12:36

vercel bot deployed to Preview – lexical September 9, 2024 12:37 View deployment

vercel bot deployed to Preview – lexical-playground September 9, 2024 12:37 View deployment

fix tests

c936c11

vercel bot deployed to Preview – lexical September 9, 2024 13:31 View deployment

vercel bot deployed to Preview – lexical-playground September 9, 2024 13:31 View deployment

etrepum reviewed Sep 9, 2024

View reviewed changes

packages/lexical-markdown/src/MarkdownTransformers.ts Show resolved Hide resolved

GermanJablo changed the title ~~[lexical-markdown] Fix: sanitize markdown in $convertFromMarkdownString to comply with CommonMark spec~~ [lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec Sep 9, 2024

etrepum approved these changes Sep 9, 2024

View reviewed changes

etrepum added the extended-tests Run extended e2e tests on a PR label Sep 9, 2024

potatowagon added this pull request to the merge queue Sep 10, 2024

Merged via the queue into facebook:main with commit 8123ca7 Sep 10, 2024
79 checks passed

GermanJablo deleted the code-md branch September 10, 2024 14:48

potatowagon added a commit that referenced this pull request Sep 12, 2024

Revert "[lexical-markdown] Fix: normalize markdown in $convertFromMar…

a6295d5

…kdownString to comply with CommonMark spec (#6608)" This reverts commit 8123ca7.

potatowagon mentioned this pull request Sep 12, 2024

Revert "[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec (#6608)" #6627

Merged

github-merge-queue bot pushed a commit that referenced this pull request Sep 12, 2024

Revert "[lexical-markdown] Fix: normalize markdown in $convertFromMar…

b0c9809

…kdownString to comply with CommonMark spec (#6608)" (#6627)

GermanJablo mentioned this pull request Oct 8, 2024

fix(richtext-lexical): various fixes when importing mdx payloadcms/payload#8608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec #6608

[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec #6608

GermanJablo commented Sep 6, 2024 •

edited

Loading

vercel bot commented Sep 6, 2024 •

edited

Loading

github-actions bot commented Sep 6, 2024 •

edited

Loading

etrepum left a comment

GermanJablo commented Sep 9, 2024 •

edited

Loading

etrepum left a comment

potatowagon commented Sep 10, 2024

[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec #6608

[lexical-markdown] Fix: normalize markdown in $convertFromMarkdownString to comply with CommonMark spec #6608

Conversation

GermanJablo commented Sep 6, 2024 • edited Loading

Problem

Solution

Future work

Test plan

Before

After

vercel bot commented Sep 6, 2024 • edited Loading

github-actions bot commented Sep 6, 2024 • edited Loading

size-limit report 📦

etrepum left a comment

Choose a reason for hiding this comment

GermanJablo commented Sep 9, 2024 • edited Loading

etrepum left a comment

Choose a reason for hiding this comment

potatowagon commented Sep 10, 2024

GermanJablo commented Sep 6, 2024 •

edited

Loading

vercel bot commented Sep 6, 2024 •

edited

Loading

github-actions bot commented Sep 6, 2024 •

edited

Loading

GermanJablo commented Sep 9, 2024 •

edited

Loading