Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized Blocks: Preserve content of invalid and missing blocks #38922

Open
2 of 10 tasks
dmsnell opened this issue Feb 18, 2022 · 2 comments
Open
2 of 10 tasks

Unrecognized Blocks: Preserve content of invalid and missing blocks #38922

dmsnell opened this issue Feb 18, 2022 · 2 comments
Assignees
Labels
[Feature] Block Validation/Deprecation Handling block validation to determine accuracy and deprecation [Status] In Progress Tracking issues with work in progress [Type] Tracking Issue Tactical breakdown of efforts across the codebase and/or tied to Overview issues.

Comments

@dmsnell
Copy link
Member

dmsnell commented Feb 18, 2022

Description

There are multiple places where in the normal flow of editing that the editor might corrupt or erase content from unrecognized blocks (block implementation is missing or it's "invalid" content), even from unrelated edits in separate blocks.

When the editor finds content it can't recognize or interact with the default mode of operation should be to preserve that content so that it doesn't unintentionally corrupt documents.

Currently there are a number of contexts in which this data corruption or data loss occurs and they revolve around two related issues:

  • Instead of using the parsed block information from load-time we run the working attributes for a given block through its block implementation to re-generate the block output.
  • In cases where we're interacting with validation and transformation we overlook any possible inner blocks a block might have.

The implications of the first issue, re-generation of block content, is that there are times when we have a set of invalidly-parsed attribute that when in turn passed back into the block implementation produce corrupt output. Consider a list block whose <ul> tag is missing. Despite loading in the entire set of bullet points and <li> tags the editor will have sourced '' as the values attribute and so will erase the existing list and write an empty one in its place. If a block is invalid or unrecognized then we should not feed the bad data into the block and expect to get good output, GIGO.

The implications of the second are that we produce a lot of false block validations and treat certain broken blocks as valid while invalidating certain valid blocks. By ignoring a block's inner blocks during this process we get around problems introduced by updates to inner blocks, which while nested inside of the parent, should be independent and unrelated to the parent's correctness. However, some blocks do base their output HTML on the presence or content of its inner blocks, making these changes meaningful. Further, some blocks upgrade and change the way they interact with inner blocks and a failure to acknowledge this will lead to the aforementioned data loss.

Issues

  • Thread block source information into blocks for preservation.
    - [Blocks] Preserve source markup in invalid blocks #38460 sourceMarkup is insufficient, so this is being replaced by a better system
    - Blocks: Remember raw source block for invalid blocks. #38923 adds source attribute on invalid blocks only
  • Show source markup in editor for invalid blocks (block-list/block)
  • Update serializeBlock to output the source markup if block is invalid
  • Fix compared markup in block comparison to make a reasonable and meaningful comparison before and after block resolution.
  • Fix resolution conversion functions to make sense and incorporate innerBlocks
  • Deprecate isValidBlockContent since it can't account for inner blocks
    - Blocks: Deprecate isValidBlockContent #38794
  • Fix validateBlock so that it accounts for inner blocks
  • Make sure that missing and invalid blocks preserve original markup on save
  • Make sure that code editor uses and preserves original markup and doesn't break the post by flipping in and out of the code editor without making any edits.
  • Audit and fix uses of originalContent vs. inner-block-aware sources for downloading missing block implementations from the register.
@dmsnell dmsnell self-assigned this Feb 18, 2022
dmsnell added a commit that referenced this issue Feb 18, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
dmsnell added a commit that referenced this issue Feb 22, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
dmsnell added a commit that referenced this issue Feb 23, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
@gziolo gziolo added [Feature] Block Validation/Deprecation Handling block validation to determine accuracy and deprecation [Type] Tracking Issue Tactical breakdown of efforts across the codebase and/or tied to Overview issues. [Status] In Progress Tracking issues with work in progress labels Feb 24, 2022
@mcsf
Copy link
Contributor

mcsf commented Feb 28, 2022

Thanks for laying out your plan like this, it makes it easier to discuss and later review. There's a lot to unpack and ponder, and I can't address everything in one sitting (and I won't really be available for the next fortnight).

But I can comment on the following (emphasis mine):

When the editor finds content it can't recognize or interact with the default mode of operation should be to preserve that content so that it doesn't unintentionally corrupt documents.

I agree with most of the issues and some of the solutions, but I also wanted to stress that the compromise in Gutenberg's block-parsing pipeline has been to isolate blocks, so that corrupt/invalid markup in a block won't corrupt the rest of the document. Now, this isolation can come at the cost of further corrupting the offending block, and this is where we can take a new look at validation and recovery.

Looking at the list of proposed issues, I'm on board with the user-facing features, like providing better visual comparison when recovering blocks, and knowing how to consider the entire block subtree, etc. But I have my doubts on where exactly the API should change and how aggressively. For example:

  • Can we find an optimum where a corrupted block remains isolated but we can restore it more intelligently, such as in your example of a List block with li tags but no container ul?
  • Can we provide nested block with better integrity without fundamentally changing some our isolation principles?

@dmsnell
Copy link
Member Author

dmsnell commented Feb 28, 2022

Thank you @mcsf for your thoughts. I too will try and share more later but I want to clear up one thing as I think we aren't talking about the same problems.

isolate blocks, so that corrupt/invalid markup in a block won't corrupt the rest of the document. Now, this isolation can come at the cost of further corrupting the offending block, and this is where we can take a new look at validation and recovery.

without fundamentally changing some our isolation principles?

It reads like you are examining this from the perspective of a broken block interfering with well-understood and validated blocks. That's not the case.

So what I'm talking about doesn't change the isolation principle at all; it upholds it where the current implementation breaks that model.

  1. Start with a document containing normal blocks
  2. Reach a state where a block is invalidated. Could be due to corruption in the raw HTML or this could come from a version-update on some block implementation.
  3. An edit in a normal, validated, recognized block can further corrupt the unrecognized block from step two, utterly and completely violating the isolation principle.

In other words I want to preserve the principle you are talking about. If I change content in a block at the end of the document it should not erase the content from an unrelated block near the beginning. It shouldn't even corrupt its neighbor blocks or its inner blocks. This is the state of things today and that's highlighted when blocks update themselves to use inner blocks where they previously didn't (and the editor only has support for the older one) or also when a block fails to validate in an easy-to-reach way (such as with the list).

The problem doesn't stem from a design principle but rather from the concept of using a block implementation to recreate new output for blocks we actively know we failed to understand. This isn't a necessary side-effect of a core system though and I believe we can resolve it to preserve that unrecognized content without giving up anything and even more, that it will resolve a number of headaches we constantly have to deal with that are due to a current implementation of the editor that violates its own principles.

dmsnell added a commit that referenced this issue Mar 4, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
dmsnell added a commit that referenced this issue Mar 9, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
dmsnell added a commit that referenced this issue Mar 11, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
dmsnell added a commit that referenced this issue Mar 14, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `source` attribute on a block which
only exists for invalid blocks at the time of this patch. That `source`
carries the original un-processed data for a block node and can be used
to reconstruct the original markup without using garbage data and without
inadvertently changing it through the series of autofixes, deprecations,
and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `source` property to provide
a more consistent and trusting experience for working with unrecognized
content.
dmsnell added a commit that referenced this issue Mar 14, 2022
Part of #38922

When the editor is unable to validate a block it should preserve the
broken content in the post and show an accurate representation of that
underlying markup in the absence of being able to interact with it.

Currently when showing a preview of an invalid block in the editor we
attempt to re-generate the save output for a block given the attributes
we originally parsed. This is a flawed approach, however, because by
the nature of being invalid we know that there is a problem with those
attributes as they are.

In this patch we're introducing the `__unstableBlockSource` attribute on 
a block which only exists for invalid blocks at the time of this patch. That 
`__unstableBlockSource` carries the original un-processed data for a block
node and can be used to reconstruct the original markup without using
garbage data and without inadvertently changing it through the series
of autofixes, deprecations, and the like that happen during normal block loading.

The noticable change is in `block-list/block` where we will be showing
that reconstruction rather than the re-generated block content. Previously
it was the case that the preview might represent a corrupted version of the
block or show the block as if emptied of all its content. Now, however,
the preview sould accurately reflect the HTML in the source post even
when it's invalid or unrecognized according to the editor.

Further work should take advantage of the `__unstableBlockSource`
property to provide a more consistent and trusting experience for
working with unrecognized content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block Validation/Deprecation Handling block validation to determine accuracy and deprecation [Status] In Progress Tracking issues with work in progress [Type] Tracking Issue Tactical breakdown of efforts across the codebase and/or tied to Overview issues.
Projects
None yet
Development

No branches or pull requests

3 participants