Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Block displays HTML entities #5901

Closed
zadam opened this issue Dec 4, 2019 · 15 comments · Fixed by #7240
Closed

Code Block displays HTML entities #5901

zadam opened this issue Dec 4, 2019 · 15 comments · Fixed by #7240
Assignees
Labels
intro Good first ticket. package:code-block type:bug This issue reports a buggy (incorrect) behavior.

Comments

@zadam
Copy link

zadam commented Dec 4, 2019

Let me first thank you for the highly anticipated code block feature! Apart from this issue it seems to work great and covers all basics already.

📝 Provide detailed reproduction steps (if any)

  1. Create empty code block:
    image

  2. Call textEditor.getData() and get <pre><code class="language-text-plain">&nbsp;</code></pre><p>text</p>

  3. Create new instance of CKEditor and call textEditor.getData('<pre><code class="language-text-plain">&nbsp;</code></pre><p>text</p>') - i.e. with what .getData() previously returned.

✔️ Expected result

image

❌ Actual result

image

📃 Other details

Similar problem happens when code block contains "&", after which is returned from .getData() as &amp; but then displayed literally from .setData().

  • Browser: latest chrome/firefox
  • OS: Ubuntu
  • CKEditor version: 16.0.0
  • Installed CKEditor plugins: …

If you'd like to see this fixed sooner, add a 👍 reaction to this post.

@zadam zadam added the type:bug This issue reports a buggy (incorrect) behavior. label Dec 4, 2019
@Mgsy
Copy link
Member

Mgsy commented Dec 10, 2019

Hi! Great to hear that you like this feature :)

I'm not sure if we can consider it as a bug - it's a code block, so you want to display the code without executing it and HTML entities might be a part of the code.

However, the current inconsistent behaviour between getData and setData might be confusing. WDYT? @Reinmar @mlewand

@zadam
Copy link
Author

zadam commented Dec 10, 2019

Imagine I'm writing a blog post about programming so it has both text and code blocks. I save a draft to the database (so call .getData()) and another day I want to continue editing the draft (so it's loaded from DB and set with .setData()), but it's not what I saw/edited yesterday with those mangled entities.

Unless I'm missing something I can't fix this behavior in the host code in any reasonable way which makes me think this inconsistency is not just confusing, but a bug.

@Reinmar
Copy link
Member

Reinmar commented Dec 16, 2019

Doh, try this:

editor.setData('<pre><code class="language-text-plain"></code></pre><p>text</p>');

And then:

editor.setData(editor.getData())

And the second snippet will change the actual editor data (the thing that you can see in the editor).

So this is definitely a bug.

@zadam
Copy link
Author

zadam commented Mar 17, 2020

I would like to note that the impact of this bug is quite large for some use cases - some programming languages (e.g. Java, JavaScript, C++, PHP) use ampersand quite heavily and this bug makes it impossible to use code blocks for these ...

@myii
Copy link

myii commented Mar 17, 2020

Just to second @zadam, as I've mentioned in my report @ zadam/trilium#917, each save increases the text over time. So right now, in a particular bash snippet that started out containing &&, it now has morphed into &amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;!!

@KodaCHC
Copy link

KodaCHC commented Apr 14, 2020

Have you a change to gives us an approximate date when this issue will be fixed?
Or givs a change wor a hotfix? I have many entries wir code snippeds and all thise are broken now

@neongreen
Copy link

Here is a partial fix: neongreen/ckeditor5-code-block@53d0541

Can be applied to &nbsp; and other entities too. Of course, a proper fix should replace all entities, not just three–four specific ones.

@Reinmar
Copy link
Member

Reinmar commented May 5, 2020

Hm... extractDataFromCodeElement() is odd. I don't know why we don't work on the view representation of the <code> element there. Such issues would not happen if we did. 

cc @oleq, do you remember why you went this way? Was there a problem with the view structure or you just went for what was easier to code?

@oleq
Copy link
Member

oleq commented May 5, 2020

cc @oleq, do you remember why you went this way? Was there a problem with the view structure or you just went for what was easier to code?

It's been a while so it's hard to tell. It's probably related to the data processor we went with around here

const stringifiedElement = dataController.processor.toData( viewChild );
const textData = extractDataFromCodeElement( stringifiedElement );
const fragment = rawSnippetTextToModelDocumentFragment( writer, textData );
.

It looks like extractDataFromCodeElement does the clean-up after the DP. Maybe we should use a different DP then?

@Reinmar
Copy link
Member

Reinmar commented May 5, 2020

I think we could remove the entire stringification part and then we wouldn't need the DP at all.

@Reinmar
Copy link
Member

Reinmar commented May 6, 2020

Some notes from my talk with @oleq:

  • The <pre><code> element, when it's empty in the editor, must be outputted with either &nbsp; or \n to the data. The goal here is the same as for other blocks – to make sure that this element has height when rendered (so it matches what the user sees in the editor).
  • However, upon upcasting, we're doing a naive toHtml( viewCode ) transformation, then extract the content of the <code> element and take the HTML that was inside (because it's HTML at this stage, not a plain text) and load it into the model. HTML is not plain text – in HTML entities are encoded (<, >, &) and they need to be de-encoded before being treated as a plain text. In the model, the content of the <code> element is a combination of text nodes and <softBreak> elements.

So, the quick fix is as proposed before – to de-encode &amp to &.

The real solution is to not use the data processor here, but instead iterate over the view <code> element's content, look for text nodes and concatenate all them together.

@Reinmar Reinmar added the intro Good first ticket. label May 6, 2020
@oleq
Copy link
Member

oleq commented May 6, 2020

I checked and this should work:

diff --git a/packages/ckeditor5-code-block/src/codeblockediting.js b/packages/ckeditor5-code-block/src/codeblockediting.js
index e819e08933..7e769d8824 100644
--- a/packages/ckeditor5-code-block/src/codeblockediting.js
+++ b/packages/ckeditor5-code-block/src/codeblockediting.js
@@ -128,7 +128,7 @@ export default class CodeBlockEditing extends Plugin {
                   editor.editing.downcastDispatcher.on( 'insert:codeBlock', modelToViewCodeBlockInsertion( model, normalizedLanguagesDefs, true ) );
                   editor.data.downcastDispatcher.on( 'insert:codeBlock', modelToViewCodeBlockInsertion( model, normalizedLanguagesDefs ) );
                   editor.data.downcastDispatcher.on( 'insert:softBreak', modelToDataViewSoftBreakInsertion( model ), { priority: 'high' } );
-                  editor.data.upcastDispatcher.on( 'element:pre', dataViewToModelCodeBlockInsertion( editor.data, normalizedLanguagesDefs ) );
+                  editor.data.upcastDispatcher.on( 'element:pre', dataViewToModelCodeBlockInsertion( editor.data, editor.editing.view, normalizedLanguagesDefs ) );

                   // Intercept the clipboard input (paste) when the selection is anchored in the code block and force the clipboard
                   // data to be pasted as a single plain text. Otherwise, the code lines will split the code block and
diff --git a/packages/ckeditor5-code-block/src/converters.js b/packages/ckeditor5-code-block/src/converters.js
index 7ee4a6324f..2fdd5b7845 100644
--- a/packages/ckeditor5-code-block/src/converters.js
+++ b/packages/ckeditor5-code-block/src/converters.js
@@ -131,7 +131,7 @@ export function modelToDataViewSoftBreakInsertion( model ) {
  * configuration passed to the feature.
  * @returns {Function} Returns a conversion callback.
  */
-export function dataViewToModelCodeBlockInsertion( dataController, languageDefs ) {
+export function dataViewToModelCodeBlockInsertion( dataController, editingView, languageDefs ) {
          // Language names associated with CSS classes:
          //
          //                {
@@ -183,8 +183,11 @@ export function dataViewToModelCodeBlockInsertion( dataController, languageDefs
                            writer.setAttribute( 'language', defaultLanguageName, codeBlock );
                   }

-                  const stringifiedElement = dataController.processor.toData( viewChild );
-                  const textData = extractDataFromCodeElement( stringifiedElement );
+                  const textData = [ ...editingView.createRangeIn( viewChild ) ]
+                           .filter( current => current.type === 'text' )
+                           .map( ( { item } ) => item.data )
+                           .join( '' );
+
                   const fragment = rawSnippetTextToModelDocumentFragment( writer, textData );

                   writer.append( fragment, codeBlock );
@@ -234,14 +237,3 @@ export function dataViewToModelCodeBlockInsertion( dataController, languageDefs
                   }
          };
 }
-
-// Returns content of `<pre></pre>` with unescaped html inside.
-//
-// @param {String} stringifiedElement
-function extractDataFromCodeElement( stringifiedElement ) {
-         const data = new RegExp( /^<code[^>]*>([\S\s]*)<\/code>$/ ).exec( stringifiedElement )[ 1 ];
-
-         return data
-                  .replace( /&lt;/g, '<' )
-                  .replace( /&gt;/g, '>' );
-}

The change causes two failing tests, though.

But I think that they were wrong in the first place and that's the reason why they fail. If anyone loads editor.setData( '<pre><code><p>Foo</p></code></pre>' ); it's an invalid HTML.

CodeBlockEditing
    data pipeline v -> m conversion
      ✖ should convert pre > code with HTML inside
        Chrome 81.0.4044 (Mac OS X 10.15.4)
      AssertionError: expected '<codeBlock language="plaintext">[]Foo<softBreak></softBreak>Bar</codeBlock>' to equal '<codeBlock language="plaintext">[]<p>Foo</p><softBreak></softBreak><p>Bar</p></codeBlock>'

      + expected - actual

      -<codeBlock language="plaintext">[]Foo<softBreak></softBreak>Bar</codeBlock>
      +<codeBlock language="plaintext">[]<p>Foo</p><softBreak></softBreak><p>Bar</p></codeBlock>

    at Context.eval (webpack:///./packages/ckeditor5-code-block/tests/codeblockediting.js?:913:122)


      ✖ should convert pre > code tag with HTML and nested pre > code tag
        Chrome 81.0.4044 (Mac OS X 10.15.4)
      AssertionError: expected '<codeBlock language="plaintext">[]FooBarBiz</codeBlock>' to equal '<codeBlock language="plaintext">[]<p>Foo</p><pre>Bar</pre><p>Biz</p></codeBlock>'

      + expected - actual

      -<codeBlock language="plaintext">[]FooBarBiz</codeBlock>
      +<codeBlock language="plaintext">[]<p>Foo</p><pre>Bar</pre><p>Biz</p></codeBlock>

    at Context.eval (webpack:///./packages/ckeditor5-code-block/tests/codeblockediting.js?:925:122)

@Reinmar Reinmar modified the milestones: next, iteration 32 May 7, 2020
@tomalec tomalec self-assigned this May 15, 2020
@tomalec
Copy link
Contributor

tomalec commented May 18, 2020

FYI, I'm about to create a PR with changes proposed by @oleq with additional tests and docs.

@tomalec
Copy link
Contributor

tomalec commented May 19, 2020

PR that fixes OP is drafted at #7240.
It still has one issue with dom-to-view conversion (details in the PR)

Reinmar added a commit that referenced this issue May 21, 2020
Fix (code-block): Stop rendering `&nbsp;` when restoring the state of an empty code block. Closes #5901.
@Reinmar Reinmar reopened this May 21, 2020
@tomalec
Copy link
Contributor

tomalec commented May 21, 2020

Closing as done, with smaller followup issue #7259
Now, totally empty (with no & inside) code block gets rendered with one non-breaking space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intro Good first ticket. package:code-block type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
9 participants