-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allowing unsafe markup #185
Comments
Perhaps not surprisingly, I don't think it's a good idea to allow for such setting :) It would directly contradict the API goals (sanitizer with such config would not be safe). The explainer mentions:
The proposed change in my opinion falls into non-goals of the API:
There are already several options available in the platform to parse HTML into a document, traverse a document, and mutate the nodes with arbitrary rules. There are existing libraries that do all of these, with user-provided config. The value add of the Sanitizer API is the security. Paraphrasing, I think Sanitizer API should at least sanitize ;) |
That suggests that there should be another API that can do this. And the sanitizer could use that API. |
Ok, I see there's not a lot of agreement here, so I'm totally fine with punting this SanitizerConfig to a later version. However, there is something else that came up in a different discussion which I had with @annevk:
A prime candidate is declarative shadow roots (formerly known as declarative shadow dom?). Anne pointed out that we can't make all people who want to parse fragments with declarative shadow roots go through the Sanitizer. The line we've been so far holding pretty well ("There can't be a Sanitizer instance that allows XSS") would still hold if we did what Anne suggests: Opt out of the Sanitizer completely within I believe that might also work for TT as they can restrict the HTML input to TrustedHTML only? |
I don't think we should allow unsafe markup in the Sanitizer. I think a platform Sanitizer API has little value without an always-safe baseline. It seems this (plus issue #184) is aiming for a refactoring of DOM APIs. That's fine, but maybe it's a mistake to tie that refactoring to the Sanitizer API. Maybe it'd be better to refactor the DOM tree manipulation APIs according to the needs of the DOM, and to return to a separate API for the Sanitizer that can be built around the Sanitizer's security requirements. I think we should make a decision of whether we want Also, thank you for keeping Trusted Types in mind. But I don't think that's the key issue here. |
@annevk In the issue mentioned by @mozfreddyb (whatwg/html#8627) you said the following:
Could you provide some example why and when would |
By default it would behave the same. But if you pass |
So the reason the DSD proposal includes an |
@mfreed7 I was thinking we would want parity with Seems even better if none of our recommended HTML parser entry points need a switch at all. |
+1 - if the new API ( I do agree that the sanitizer needs to "know" to traverse shadow trees. And I agree that no switch is better than a switch, if it's possible. |
Well, it wouldn't be identical as it would do sanitizing and have the option to do declarative shadow roots, and we could start considering offering safety switches that would disable |
I'll provide a spec proposal to handle shadow roots in the sanitizer. Basically: If an element has a shadow root then the sanitizer needs to recurse into it, like into regular child nodes. If I read comments here correctly, this would resolve some of this issue. Other than that, I'd like re-iterate my previously stated position - which was consensus in this group for about 2 years - that security guarantees can't be optional. I'm very much against a security-focused API with security guarantees that only apply when the developer doesn't make a mistake. It's the whole point of a guarantee to hold unconditionally. If we want
I very much agree with the first sentence, and not at all with the second. Offering developers capabilities which they can combine in whichever way seems best to them is certainly the way to build a good platform, and in line with the extensible web manifesto. If we have seemingly conflicting requirements that seem to get in the way of this, then that's probably because we're trying to cram dis-similar things into a single API rather than providing combinable low-level primitives. As a strawman, we could have:
All those capabilities are already part of the current proposals and implementations. That would seem to meet all the requirements. The only controversial step appears to be whether we expose them separately or whether we try to force-combine them into a single entry point, which would then force us to trade some requirements against others. |
@otherdaniel I think thus far you haven't really articulated why it needs to be a distinct entry point. As to why it should not be distinct: I think that would be needlessly confusing for web developers, especially given that a context element is vital as we established in the past. |
It doesn't have to be distinct; it has to uphold the security requirements. Ensuring that an app is secure requires ensuring that a security invariant is upheld across all code paths. To do so, the reviewer or tool needs a way to limit the search space. A "sometimes secure" method, that switches behaviour based on its parameters, makes it hard to do
That especially applies if the underlying call is hidden in libraries, where the "options bag" is not a constant but might be assembled elsewhere in the code and passed in by callers. In all of these cases, an optionally insecure method would force the analysis - whether manual or automated - to pessimize the analysis and count those methods as insecure, or try to establish the runtime value. In all but trivial applications, this increases the analysis complexity substantially, or even breaks it entirely. For example, Google uses static analysis tools that run as part of presubmit checks. These should be implementable with some reasonable efficiency so that they can run on every commit. A tool that ensures that a certain set of methods is called in some critical code paths can do that; a tool that would have to establish that runtime values will always have a certain shape - like not including the opt-out parameter - probably cannot. I'm unaware of even a single, real-world validation tool that can verify runtime shape of values in moderately complex applications and is still fast enough to run as a presubmit check. It would, at the very least, require global static analysis, instead of local, which makes it substantially more expensive. Also, real world applications tend to have library dependencies or multiple authors, which often makes it unrealistic to enforce a particular coding style everywhere, such as only using constants for the second setHTML parameter. Whether upholding a security requirement means it has to be distinct depends on whether not-upholding-the-security-requirements is a requirement elsewhere. If we can get away with a single API that upholds the security guarantees the Sanitizer wants to give, then nothing has to be distinct.
I don't understand why two methods with clearly separable semantics would be more confusing than a single method that supports those same two semantics via an options bag. My intuition is that using an options bag (or the number of arguments) to change the fundamental behaviour of how a function works is more confusing than separately named functions. Why do you think it's the other way?
Can you point me to where this was established? I'd like to read up on it. Not sure this adds much to this particular discussion, but as I remember it the context element serves to allow context-dependent parsing. This was considered very important to some Sanitizer WICG participants, so we went along with it. But for all I know, always parsing into e.g. a fragment and then inserting should be just fine, both in security aspects and in API ergonomics. |
I have two comments Re: Context element:
On (static) analysis and security boundaries: As the maintainer of an eslint plugin that disallows unsafe uses of HTML sinks for XSS protection and co-editor of the Sanitizer API spec, I very much agree with the notion that we want something that allows for sound security guarantees and making sure that it's amenable to static analysis. In my experience with our scanning at Mozilla where static analysis is able to catch mistakes, we still have to distinguish between "this is clearly allowed" and everything else. In my experience the "everything else" is usually not badness, but rather something that needs a close look and might be safe. With this in mind, I think the same could be true for the proposal here. Here's my reasoning:
If you care about more than just XSS, then you need to enforce additional checks (manual review? blessed sanitizer instances?) regardless of the guarantees of the API. The proposal to offer a If all you care is exactly the XSS protections from the Sanitizer API defaults, then I agree that an The only way out would be blessed sanitizers or the static-string check or finding a non-sanitizing entry point for those who want to use shadowRoots 😕 |
I am unaware of anything in IDL or JavaScript that could force something to be a literal. If you can pass in a string, you can also pass in a variable. Which allows dynamic construction of options. Which is the thing where the analysis becomes difficult.
Well yes, you can check for Sanitizer instance creation, and you can check for setHTML calls. Having to also establish the dynamic value range of option bags would change an awful lot.
A way out would be a method that actually sanitizes. I'm not seeing anything in your argument why that wouldn't work. |
It looks like you skipped the bit about the context element. |
The security risk basically only exists in a scenario where you re-parse the HTML (so a case of |
I'd like to challenge the argument that contextless Sanitizer leads to XSS. The only examples of the potential security risk due to a contextless sanitizer API were given in #42 (comment), and I don't see how they apply to a Sanitizer that is only concerned with XSS, and that doesn't return a string. To my understanding, the first example is demonstrating that plaintext elements can contain angle brackets - but the sanitizer doesn't reparse. The second demonstrates that DOM XSS sinks exist in the current DOM (e.g. inserting under a script element is risky), but sanitizer doesn't protect against script gadgets, and inserting anything under a script node is an obvious script gadget. Similarly one can take text nodes from a This argument demonstrates not that the sanitizer can be insecure, but rather that sanitizer fundamentally cannot address DOM XSS in its entirety, because other script-executing sinks exist. I agree that context is important for parsing, but can't see a vector inside sanitizer API that would lead to XSS. |
I'm not concerned with XSS personally. I think it's problematic that the arguments to something like |
I see, and agree that it would be surprising. If Starting from the XSS use case, I'm mostly concerned Is it because we need to parse DSD in something else than |
I think at that point it's so similar to And yeah:
|
Wouldn't adding DSD to A new XSS sink would still need to be incorporated in tooling, documentation, would surface in script gadgets in existing code, and all that. To me it sounds less surprising to put it under the existing, known to be bad, sink. We can - and should - tackle We tackled this problem with Trusted Types, so that we can selectively enable values reaching the sink, without removing the sink itself, but perhaps there are other options that allow for solving the |
I think the problem with adding DSD to [edit]: It's actually explicitly called out in the explainer: https://github.com/mfreed7/declarative-shadow-dom/tree/master#potential-html-sanitizer-bypass |
Yeah, I really rather not revisit where declarative shadow roots can be used with respect to existing HTML parser entry points. |
I remember that, I myself pushed for DSD not to use With XSS focus alone, it seems like we either break sanitizers that have a very specific config, with a workaround present, or introduce a new DOM XSS sink to the platform, that - even if aptly named and never used - can be exploited via script gadgets. I would prefer we fix the affected sanitizers. |
Also, there's another reason to get rid of |
It’s worth pointing out that Why can’t that be considered? |
@mfreed7 as per above we want contextual parsing. |
This proposes to recurse into any shadow roots. This follows the discussion at WICG#185
We have resolved this issue by having different parser entry points (both of which will support declarative shadow roots). |
We've been told that people do want to deviate from our security guarantees with all terrible, known consequences.
We already allow people to opt out of the "known elements" by switching
allowUnknownMarkup
. However, this still disallows e.g.,script
and friends.Apparently, people may appreciated the sanitizer for it's abilities to modify and walk a DOM tree regardless of the security guarantees. This may necessitate the existence of a pref that allows arbitrary and insecure allow lists and just skips all list checks (baseline, known).
I'm suggesting we introduce
boolean allowUnsafeMarkup
in the SanitizerConfig dictionary.Up until now this seemed very simple and straightforward but until today we have always allowed our friends working on Trusted Types to enjoy the shiny assumption that the Sanitizer does not need to be subject to TT checking & enforcement because of its strong security guarantees.
With that ship sailing, we may have to revisit: Trusted Types would have to monkey-patch/check the SanitizerConfig.
We can still punt this to a v1.1.
CC @otherdaniel @annevk @koto
The text was updated successfully, but these errors were encountered: