-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rawHtml option for extensions to use on trusted input #569
Conversation
Basically this allows extensions to use something like There are a couple of rules though:
|
Parsedown.php
Outdated
// extension | ||
elseif (isset($Element['unsafeHtml'])) | ||
{ | ||
$text = $Element['unsafeHtml']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this unsafeHtml
will result in the exact opposite of what your intentions were 😉
If one reads $Element['name']
, one will put the element's name in there. If one reads $Element['unsafeHtml']
, one will put unsafe HTML in there. 😆 This should rather be called safeHtml
, trustedHtml
or, preferrably, just html
. Parsedown is a library, we can assume that developers read the manual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going for more along the lines of https://developer.apple.com/documentation/swift/unsafepointer. I think it is clear that unsafeHtml
contains unsafe HTML, whether or not putting stuff in there is a good idea or not depends how much you trust it ;p
(I think the best name is probably going to depend on whether there is an assumption that we will automatically escape something called unsafeHtml
).
I think that whenever this feature is used, the HTML is unsafe (i.e. it depends on the trustworthiness of input). If you know enough about the input at runtime for it to be safe, you will be able to refactor to the preferred AST representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going for more along the lines of https://developer.apple.com/documentation/swift/unsafepointer. I think it is clear that
unsafeHtml
contains unsafe HTML
Having a UnsafePointer
is something completely different, it's a "unsafe pointer to something". A array key in contrary is no pointer, it's rather a variable containing a value. That's the fundamental difference between pointers and variables in programming. A variable's name, and a array key is nothing else than that, must describe the variable's value. Don't confuse variables with pointers!
Furthermore, unsafeHtml
contains no unsafe HTML - this statement is simply wrong. First of all it contains just HTML - whether it is safe or unsafe depends on what the extension's developer puts in there. However, we explicitly tell extension developers that they are supposed to put safe HTML in there, i.e. if the developer puts unsafe HTML in there, he is doing it wrong. Whichever way you look at it, it should never be "unsafe HTML".
If you know enough about the input at runtime for it to be safe, you will be able to refactor to the preferred AST representation.
#568 is a perfect example that this assumption isn't true 😉 (see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that is important to me is that whatever name we pick hints at the following:
[feature] provides no automated escaping or sanitisation. You are responsible for handling the escaping and sanitisation of any HTML you work with through [feature] to avoid cross-site scripting.
If "unsafeHTML" doesn't do that, fine. Let's figure out something that does.
It's more important that you understand why I have chosen this name, it is not important to me that you agree with it because the name isn't fixed. If the name is doing a bad job at explaining the why then let's change it.
Having a
UnsafePointer
is something completely different, it's a "unsafe pointer to something". A array key in contrary is no pointer, it's rather a variable containing a value. That's the fundamental difference between pointers and variables in programming. A variable's name, and a array key is nothing else than that, must describe the variable's value. Don't confuse variables with pointers!
Thanks for the explainer 😒, but I'm not trying to compare an array key to a pointer. This was purely a linguistic comparison of the use of the word "unsafe".
This was a response to:
Calling this unsafeHtml will result in the exact opposite of what your intentions were
The link was a citation to justify the choice of name. In that, in the link unsafe[Thing] is used to mean that "you should be careful when using [Thing]", because there are known examples of unsafe behaviour. The same is true in our example. If you are not careful with the HTML being written into the variable (for example if you write untrusted user-input into it), you will yield unsafe results.
Again, the only purpose of the link was to cite an example of the word "unsafe" being used to mean "be careful", and not "don't be careful".
First of all it contains just HTML - whether it is safe or unsafe depends on what the extension's developer puts in there. However, we explicitly tell extension developers that they are supposed to put safe HTML in there
I agree with all of this, my intention is not to instruct developers to put unsafe HTML in this location, but rather to use the word as an explicit reminder that what they are doing can have unsafe results and they must take care (i.e. a "be careful" and not "don't be careful").
Here's what the documentation I linked to states about unsafe[Thing]:
UnsafePointer provides no automated memory management or alignment guarantees. You are responsible for handling the life cycle of any memory you work with through unsafe pointers to avoid leaks or undefined behavior.
Here's basically the same description, replaced with things that can go wrong in HTML land, rather than in memory:
Unsafe HTML provides no automated escaping or sanitisation. You are responsible for handling the escaping and sanitisation of any HTML you work with through unsafe HTML to avoid cross-site scripting.
I suppose this really uses "unsafe" to mean "there is no safety provided for you, please provide it yourself", rather than "this is where you should put things that are unsafe".
You actually get the reverse problem if we call give it a positive prefix, e.g. something like "safeHtml", does this indicate that we are applying sanitisation to make it safe? The arguments for misinterpretation go both ways round.
Parsedown.php
Outdated
$markup .= $this->{$Element['handler']}($Element['text'], $Element['nonNestables']); | ||
$markup .= $this->{$Element['handler']}($text, $Element['nonNestables']); | ||
} | ||
elseif ($unsafeHtml !== true or $this->safeMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, too. However, the problem is that the assumption behind this is simply wrong. Using a extension doesn't necessarily make the contents unsafe, especially because we break BC anyway and all extension developers will have to read the manual. Just take #568: By simply calling Parsedown::escape()
before passing the contents to the highlighter (or when the highlighter already takes care of escaping), one will get a absolutely safe result.
Your intentions are to make it more safe, but in reality you'll make it less safe! By making extensions incompatible with safe mode, extension developers will forcefully disable safe mode - what decreases security, not improves it.
Furthermore, just think about what you've added to Parsedown's README.md
:
Safe mode does not necessarily yield safe results when using extensions to Parsedown.
Exactly. Parsedown as a project can only state whether Parsedown is safe, by using a extension the extension's developer is in charge.
side note: use !$unsafeHtml
instead of $unsafeHtml !== true
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, too. However, the problem is that the assumption behind this is simply wrong. Using a extension doesn't necessarily make the contents unsafe, especially because we break BC anyway and all extension developers will have to read the manual. Just take #568: By simply calling Parsedown::escape() before passing the contents to the highlighter (or when the highlighter already takes care of escaping), one will get a absolutely safe result.
This isn't correct in the general case. The only way for it to be safe is to be aware of the structure of what is being output. What stops the highlighter from having a similar probems to parsedown <=1.6.4
where vulnerabilities can actually be introduced by the parser (even after reading perfectly escaped input).
Your intentions are to make it more safe, but in reality you'll make it less safe! By making extensions incompatible with safe mode, extension developers will forcefully disable safe mode - what decreases security, not improves it.
This is definitely something to consider carefully – but I don't think that there is any way for an extension to be safe unless it is aware of the structure that it is outputting (sans just outputting completely escaped text, which it can do already).
Exactly. Parsedown as a project can only state whether Parsedown is safe, by using a extension the extension's developer is in charge.
This is true, is it not worth making it difficult to make things unsafe though?
side note: use !$unsafeHtml instead of $unsafeHtml !== true 😉
Latter has a preferable fall through if value is truthy but not true ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct in the general case. The only way for it to be safe is to be aware of the structure of what is being output. What stops the highlighter from having a similar probems to parsedown <=1.6.4 where vulnerabilities can actually be introduced by the parser (even after reading perfectly escaped input).
What stops Parsedown (or PHP's htmlspecialchars()
) from having a bug and doing escaping wrong?
This is definitely something to consider carefully – but I don't think that there is any way for an extension to be safe unless it is aware of the structure that it is outputting (sans just outputting completely escaped text, which it can do already).
The assumption is true in the general case, just the conclusion isn't right. #568 could undoubtfully parse the highlighter's return value into Parsedown's AST structure - however, this would simply make no sense.
This is true, is it not worth making it difficult to make things unsafe though?
You're not making it difficult to make things unsafe, you're doing the exact opposite, you're forcing extension developers to make it unsafe 😉
Latter has a preferable fall through if value is truthy but not true ;)
Sorry, but you'll have to explain under which exact circumstances this can happen here 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What stops Parsedown (or PHP's htmlspecialchars()) from having a bug and doing escaping wrong?
Parsedown is aware of the structure being output, so is htmlspecialchars()
(it outputs zero elements). This doesn't make them perfect at doing this, but at least this method has a hope of being safe.
(For example, Parsedown itself would be implementable using only the AST representation to construct elements).
In contrast, if you are blind to the structure being output, then you are necessarily blind to whether it is safe (if you HTML escape the whole thing you now know its structure – zero elements).
The only thing you can do is to trust the thing that gave you this unknown structure, but this only has a hope of being safe if someone is aware of the structure (otherwise how does anyone make any assurances on what is being output?).
I think we might be trying to achieve different things here – this PR was really aimed at allowing HTML without going via the AST provided that both the extension author and the user are aware that the generated HTML requires trust of user input (this is what was actually asked for by #568).
However, I think you've made it apparent that we should actually allow extensions to also make a different claim: assume that input is trusted, but instead "vouch" for the HTML they have generated as being safe.
I think we should perhaps add separate options for this, what do you think?
The former would allow someone to effectively implement things that require trusted user input. For example: Parsedown's feature which allows raw HTML from user input (but escapes it if the user states that input is unsafe).
The latter would allow extension generated HTML to be output unescaped in both regular and safe mode (but would require that the extension author "promise" that it is safe to do this).
Rename "unsafeHtml" to "rawHtml"
I've renamed "unsafeHtml" to "rawHtml" and added an option such that on a per-element basis the safety of said HTML can be declared to have been checked, and thus allow it in safe mode. (WIP names for both options). What do you think? |
I totally understand what you were trying to achieve by choosing
Yeah, this explains it. I rather thought the reason behind the Just to explain my point: I strongly believe in self-responsibility. I don't want to end up in a situation in which things get complicated for me just because someone else though additional precaution is necessary because some bad third-party developers can't read the manual (yes, I understand that this wasn't your intention). Just calling it Thus I believe that Btw: I still think that Lines 1503 to 1517 can be refactored to just two lines: $allowRawHtmlInSafeMode = isset($Element['allowRawHtmlInSafeMode']) && $Element['allowRawHtmlInSafeMode'];
$permitRawHtml = !$this->safeMode || $allowRawHtmlInSafeMode; |
Tyty, I've refactored that now :)
This is me trying to protect against type juggling surprises, but I've foregone doing it for now since there are examples of me not doing this even within the same method ;p At some point I'll try and enumerate the instance variables that might be prone to (implicit) type casting errors and put stronger type checks there – you're quite right about it not really mattering for the locally scoped stuff that we're setting explicitly though ;p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 👍
@aidantwoods would be great if the result of this PR could be tagged as a release, I'm currently suffering some double escaping :]. |
^ +1, I'm using Jonas' tutorial (which he described in #568), and I always end up with escaped tags. So based on that <div>Just an example</div> Results in: <pre>
<code class="hljs language-html">
<span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>Just an example<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span>
</code>
</pre> |
@hvt I'll try to get 1.8.0 out this weekend :) |
Fixes #568