-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Blocks] Preserve source markup in invalid blocks #38460
Conversation
Size Change: -4 B (0%) Total Size: 1.14 MB
ℹ️ View Unchanged
|
259f024
to
4e18a5c
Compare
originalUndelimitedContent, | ||
}, | ||
innerHTML: rawBlock.blockName ? originalContent : rawBlock.innerHTML, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning for the removal of the condition here? Is it something that never happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously we have been re-generating block output from the parsed inputs. this is the reason that the editor can silently corrupt or lose content. for HTML soup outside of the blocks though there's no way to re-generate the block since there is no block. because of this we special-case the HTML soup and use innerHTML
directly, which ironically is closer to the correct behavior for regular blocks too.
after preserving the source markup every parsed block contains that original content which we can thread through and it doesn't matter if it's a block proper or not, so no more need for the conditional and special-casing.
@@ -228,6 +230,7 @@ export function parseRawBlock( rawBlock ) { | |||
parsedInnerBlocks | |||
); | |||
parsedBlock.originalContent = normalizedBlock.innerHTML; | |||
parsedBlock.sourceMarkup = normalizedBlock.sourceMarkup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what's the difference between sourceMarkup and originalContent. As far as I remember, originalContent was introduced for the same reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of originalContent
in #14443 was ironically made to support inner blocks even though we've got a few years to reflect on it and confirm it breaks them. 🙃
I find originalContent
as a name confusing particularly because it stands in sharp contrast to how we define "content" in innerContent
. That is, innerContent
is an array that specifically represents the view of the document including the inner blocks while originalContent
is a string that specifically represents the view where the inner blocks are missing.
I considered removing originalContent
altogether but I'm worried about what might be using it. As noted in the original PR we should be moving towards removing all use of innerHTML
everywhere as it's an historical artifact going back to an original design question revolving around how to represent nesting.
sourceMarkup
though has a clear referent that's not muddied by the confusing terminology (confusing not only because it means the opposite of what you should infer based on the block API but also because it implies it came from the original document when in fact it was re-generated on the fly from possibly faulty inputs).
With this change in we should be able to start removing references to originalContent
and if possible, I'd rather do that in other work since this PR is fixing a longstanding issue and I'm nervous that if we start updating and changing other parts we up the risk of introducing regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the intent is indeed to change the way we compute originalContent
and the name change is there to bring clarity.
I think bringing clarity is a great argument but I wonder if we should just continue using originalContent
that way we're not stuck supporting this old property forever and add some useless memory consumption to our stores. I'm hesitant personally here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want to stick with the renaming, we could potentially make originalContent
a getter/setter property in order to add a deprecated
call maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're uncomfortable keeping both I'd be in favor of removing originalContent
entirely even at risk of breaking things. I'll check the usage in the missing block and in other places. innerHTML
should go away too but I'm not going to change that here.
@@ -53,6 +53,9 @@ function normalizeParsedBlocks( blocks ) { | |||
// Recurse to normalize inner blocks | |||
block.innerBlocks = normalizeParsedBlocks( block.innerBlocks ); | |||
|
|||
// Ignore since this isn't a part of the parser output | |||
delete block.sourceMarkup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine since we already have a dedicated snapshot for the source file, maybe we could also consider the same thing for originalContent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds great. in fact I think we can go one step further and more positively talk about the attributes we want to check instead of removing the ones we don't. I'll post an update with what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3a9ffa6 I've adjusted this function but left originalContent
for the same reason I didn't want to introduce the sourceMarkup
- it turns this 30-line PR into a 700+ line PR. I wouldn't mind doing a follow-up PR though to update all the fixtures to remove the things we don't want and where that testing-methodology change could be reviewed by itself.
a3b1b6c
to
3e38d43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good overall for me despite my hesitation about the renaming.
Would love some 👀 from @mcsf here.
Previously we've been asserting in our full-content snapshot tests that the `originalContent` property doesn't change, but it's faulty and going away (see #38460). By removing it from the test fixtures we can relax our constraint that it exists and doesn't change, leaving future work easier to review and focus. This is a generated property while loading the blocks and use of it leads to content corruption and loss on certain invalid blocks and blocks with inner blocks. After this change we're still veryifying the full load and save flow for the block implementations.
Previously we've been asserting in our full-content snapshot tests that the `originalContent` property doesn't change, but it's faulty and going away (see #38460). By removing it from the test fixtures we can relax our constraint that it exists and doesn't change, leaving future work easier to review and focus. This is a generated property while loading the blocks and use of it leads to content corruption and loss on certain invalid blocks and blocks with inner blocks. After this change we're still veryifying the full load and save flow for the block implementations.
Previously we've been asserting in our full-content snapshot tests that the `originalContent` property doesn't change, but it's faulty and going away (see #38460). By removing it from the test fixtures we can relax our constraint that it exists and doesn't change, leaving future work easier to review and focus. This is a generated property while loading the blocks and use of it leads to content corruption and loss on certain invalid blocks and blocks with inner blocks. After this change we're still veryifying the full load and save flow for the block implementations.
Previously we have been re-generating a block's save content when we fail to properly validate that block. While this can work in many cases it breaks down quickly because of the invalidly-parsed attributes. Consider a list block whose `<ul>` tag has been removed. The block will fail to validate and then also fail to read the content of the list items because the CSS selector used to source the attribute won't work. The attribute for the list's content will be empty and `getSaveContent` will return an empty list, this despite the fact that the contents were largely fine in the post HTML when loaded. In this patch we're threading a new property into the block node called `sourceMarkup` whose purpose is to preserve the original HTML in a post so that we can choose to preserve that on save instead of corrupting or losing content through the HTML re-generation. With `sourceMarkup` we get a less-processed version of the source to relay into save which should eliminate a common point of frustration: saving a post erases content.
3a9ffa6
to
62e8bdc
Compare
Rebuilt this after the merge of #38638, but I'm still trying to totally eliminate |
Unfortunately as I continue to try and eliminate Tossing it out there that it might be practical to fix the existing problem (the editor wipes out or corrupts content from unrelated changes) with the small fix and then start chiseling away at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'd appreciate a second opinion here.
I'm going to wait for another opinion as well. Thank you @youknowriad The more I think about this though the more I think a different interface is going to be better. Possibly that's just two properties vs. one: This being said, I'm still comfortable pushing this out as-is since it's so small and because it does solve one major problem at hand and if we improve it afterwards and can remove |
@youknowriad I'm closing this in favor of #38923 because I believe it provides a better longer-term solution with its |
Status
Currently this is ON HOLD as I prepare a broader change that addresses the shortcomings noted in the discussion, namely the
originalContent
field and all the little places we need to compare the preserved markup with the re-generated markup, such as in the block comparer during a "Resolve" for an invalid block.sourceMarkup
attribute but wanted to hear thoughts first about it. I wondered if we rely on these fixtures and if they are providing much value to us. If they are, I wonder why we have a custom runner instead of usingjest
, if that's a carry-over from earlier times.sourceMarkup
property when comparing. On one hand I like this but on the other hand maybe it would be best to leave it in and update the fixtures with the 600+ lines of code that adds to this PR because it would be less surprising to come across. Any thoughts are appreciated.originalContent
and I'd like to audit all those places and even get rid oforiginalContent
if we can.originalContent
but should we have it parsesourceMarkup
instead?innerHTML
between a source block and the re-generated markup. showing the difference withsourceMarkup
would show the delimiters and also the inner blocks. inner blocks could be noisy, but also essential. showing the delimiters could be confusing.innerHTML
if it fails to validate or re-generate block output. can this wipe out inner blocks?innerHTML
which means that if we get differentinnerBlocks
we'll accept it as valid.innerBlocks
with an empty array for validation, completely ignoring them.Description
Previously we have been re-generating a block's save content when
we fail to properly validate that block. While this can work in many
cases it breaks down quickly because of the invalidly-parsed attributes.
Consider a list block whose
<ul>
tag has been removed. The block willfail to validate and then also fail to read the content of the list items
because the CSS selector used to source the attribute won't work. The
attribute for the list's content will be empty and
getSaveContent
willreturn an empty list, this despite the fact that the contents were largely
fine in the post HTML when loaded.
In this patch we're threading a new property into the block node called
sourceMarkup
whose purpose is to preserve the original HTML in a postso that we can choose to preserve that on save instead of corrupting or
losing content through the HTML re-generation.
With
sourceMarkup
we get a less-processed version of the source torelay into save which should eliminate a common point of frustration:
saving a post erases content.
Testing Instructions
Screenshots
Invalid list before
Note that the "Resolve" dialog already showed the proper content, but in the editor it looks like an empty list.
Invalid list after
The editor shows the proper list items and will preserve them on save.
Types of changes
Checklist:
*.native.js
files for terms that need renaming or removal).