Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra <body> node and ignored <script> with hook uponSanitizeElement #199

Closed
jfparadis opened this issue Feb 21, 2017 · 9 comments
Closed

Comments

@jfparadis
Copy link

Two issues with hook uponSanitizeElement:

  • Calling sanitize causes one extra callback, aways with extra node.
    Example: <div><script>alert(0)</script></div>
    Hook called 4 times: BODY, DIV, SCRIPT, text

  • When sanitizing string containing only <script> tag, there is no callback for script.
    Example: <script>alert(0)</script>
    Hook called 1 time: BODY

Please see this: JSFiddle.

@cure53
Copy link
Owner

cure53 commented Feb 21, 2017

Ah, that's the old problem with the browser moving SCRIPT, STYLE and others into the HEAD element when not prepended by any other element. This is causing trouble in many situations. Technically not our fault but I feel more and more we should fix this.

How do you feel about a FORCE_BODY configuration flag that simply makes sure that nothing goes into the HEAD? We could default it to false right now for 0.8.x and set it to true starting with 0.9.x.

@jfparadis
Copy link
Author

jfparadis commented Feb 21, 2017

Very interesting.

Question: if <script> gets moved to <head>, why uponSanitizeElement is not called with <head>, then with the <script>?

@cure53
Copy link
Owner

cure53 commented Feb 21, 2017

We use the doc.body as the platform to check and sanitize the markup :) That's why it simply disappears.

The idea is now to pre-fill the body with a nonsense element, then initialize the document, then remove the nonsense element again - all behind the suggested FORCE_BODY flag. Then your bug should disappear :)

@cure53
Copy link
Owner

cure53 commented Feb 23, 2017

Would the proposed fix work for you?

@jfparadis
Copy link
Author

jfparadis commented Feb 23, 2017

Yes, FORCE_BODY would work. There is another approach:

If I create a simple test.html with the following content:

<script src="hello.js"></script>

And if I load it in the browser and look at the DOM, I obtain:

<html>
    <head>
        <script src="hello.js"></script>
    </head>
    <body>
    </body>
</html>

Because the callback starts at doc.body, we get the current behavior (the head is ignored).

Now, if I create a simple test2.html with the following content:

<html>
    <head>
        <script src="hello.js"></script>
    </head>
    <body>
    </body>
</html>

I get the same behavior, the <head> is also ignored ( you can see the result on my updated JSFiddle, the head is also ignored).

An alternative approach would be to call doc.head followed by doc.body. The advantage would be to always process head markup and body markup, and to mimic more closely the browser behavior.

@cure53
Copy link
Owner

cure53 commented Feb 24, 2017

We thought about that initially but I think the changes would be too disruptive. I'll see that I get an implementation of FORCE_BODY to work asap.

cure53 added a commit that referenced this issue Feb 24, 2017
Added two tests for FORCE_BODY
Added description to README.md
@cure53
Copy link
Owner

cure53 commented Feb 24, 2017

@jfparadis Check out the latest commits in the master branch. This should eliminate the problem. tests are green as well.

cure53 added a commit that referenced this issue Feb 24, 2017
Changed the FORCE_BODY inject to <remove></remove>
Added additional test coverage
@jfparadis
Copy link
Author

A few notes:

  1. This solution makes perfect sense: as you say in the description, the current behavior was counter-intuitive. This changes solves the normal case.

  2. And if someone absolutely need to parse the whole document, this is covered by {WHOLE_DOCUMENT: true}.

  3. I've updated the JSFiddle with the latest code, and it works as specified.

  4. I the JSFiddle, when sending a complete document with the new option (third test), we don't get HTML, HEAD, or a duplicate BODY. That's what one should expect.

Well done!

@cure53
Copy link
Owner

cure53 commented Feb 26, 2017

Thanks, my pleasure :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants