Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rawHtml option for extensions to use on trusted input #569

Merged
merged 6 commits into from
Mar 18, 2018

Conversation

aidantwoods
Copy link
Collaborator

Fixes #568

@aidantwoods
Copy link
Collaborator Author

Basically this allows extensions to use something like $Element['unsafeHtml'] to output arbitrary data.

There are a couple of rules though:

  1. If the text key is set it is always preferred, so you MUST unset it if you wish to use unsafeHtml
  2. unsafeHtml will behave like text if safe mode is enabled (i.e. unsafe behaviour will be disabled if the user of the extension disagrees about the input being trusted).

Parsedown.php Outdated
// extension
elseif (isset($Element['unsafeHtml']))
{
$text = $Element['unsafeHtml'];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this unsafeHtml will result in the exact opposite of what your intentions were 😉

If one reads $Element['name'], one will put the element's name in there. If one reads $Element['unsafeHtml'], one will put unsafe HTML in there. 😆 This should rather be called safeHtml, trustedHtml or, preferrably, just html. Parsedown is a library, we can assume that developers read the manual.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for more along the lines of https://developer.apple.com/documentation/swift/unsafepointer. I think it is clear that unsafeHtml contains unsafe HTML, whether or not putting stuff in there is a good idea or not depends how much you trust it ;p

(I think the best name is probably going to depend on whether there is an assumption that we will automatically escape something called unsafeHtml).

I think that whenever this feature is used, the HTML is unsafe (i.e. it depends on the trustworthiness of input). If you know enough about the input at runtime for it to be safe, you will be able to refactor to the preferred AST representation.

Copy link
Contributor

@PhrozenByte PhrozenByte Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going for more along the lines of https://developer.apple.com/documentation/swift/unsafepointer. I think it is clear that unsafeHtml contains unsafe HTML

Having a UnsafePointer is something completely different, it's a "unsafe pointer to something". A array key in contrary is no pointer, it's rather a variable containing a value. That's the fundamental difference between pointers and variables in programming. A variable's name, and a array key is nothing else than that, must describe the variable's value. Don't confuse variables with pointers!

Furthermore, unsafeHtml contains no unsafe HTML - this statement is simply wrong. First of all it contains just HTML - whether it is safe or unsafe depends on what the extension's developer puts in there. However, we explicitly tell extension developers that they are supposed to put safe HTML in there, i.e. if the developer puts unsafe HTML in there, he is doing it wrong. Whichever way you look at it, it should never be "unsafe HTML".

If you know enough about the input at runtime for it to be safe, you will be able to refactor to the preferred AST representation.

#568 is a perfect example that this assumption isn't true 😉 (see below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that is important to me is that whatever name we pick hints at the following:

[feature] provides no automated escaping or sanitisation. You are responsible for handling the escaping and sanitisation of any HTML you work with through [feature] to avoid cross-site scripting.

If "unsafeHTML" doesn't do that, fine. Let's figure out something that does.

It's more important that you understand why I have chosen this name, it is not important to me that you agree with it because the name isn't fixed. If the name is doing a bad job at explaining the why then let's change it.


Having a UnsafePointer is something completely different, it's a "unsafe pointer to something". A array key in contrary is no pointer, it's rather a variable containing a value. That's the fundamental difference between pointers and variables in programming. A variable's name, and a array key is nothing else than that, must describe the variable's value. Don't confuse variables with pointers!

Thanks for the explainer 😒, but I'm not trying to compare an array key to a pointer. This was purely a linguistic comparison of the use of the word "unsafe".

This was a response to:

Calling this unsafeHtml will result in the exact opposite of what your intentions were

The link was a citation to justify the choice of name. In that, in the link unsafe[Thing] is used to mean that "you should be careful when using [Thing]", because there are known examples of unsafe behaviour. The same is true in our example. If you are not careful with the HTML being written into the variable (for example if you write untrusted user-input into it), you will yield unsafe results.

Again, the only purpose of the link was to cite an example of the word "unsafe" being used to mean "be careful", and not "don't be careful".


First of all it contains just HTML - whether it is safe or unsafe depends on what the extension's developer puts in there. However, we explicitly tell extension developers that they are supposed to put safe HTML in there

I agree with all of this, my intention is not to instruct developers to put unsafe HTML in this location, but rather to use the word as an explicit reminder that what they are doing can have unsafe results and they must take care (i.e. a "be careful" and not "don't be careful").

Here's what the documentation I linked to states about unsafe[Thing]:

UnsafePointer provides no automated memory management or alignment guarantees. You are responsible for handling the life cycle of any memory you work with through unsafe pointers to avoid leaks or undefined behavior.

Here's basically the same description, replaced with things that can go wrong in HTML land, rather than in memory:

Unsafe HTML provides no automated escaping or sanitisation. You are responsible for handling the escaping and sanitisation of any HTML you work with through unsafe HTML to avoid cross-site scripting.

I suppose this really uses "unsafe" to mean "there is no safety provided for you, please provide it yourself", rather than "this is where you should put things that are unsafe".

You actually get the reverse problem if we call give it a positive prefix, e.g. something like "safeHtml", does this indicate that we are applying sanitisation to make it safe? The arguments for misinterpretation go both ways round.

Parsedown.php Outdated
$markup .= $this->{$Element['handler']}($Element['text'], $Element['nonNestables']);
$markup .= $this->{$Element['handler']}($text, $Element['nonNestables']);
}
elseif ($unsafeHtml !== true or $this->safeMode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, too. However, the problem is that the assumption behind this is simply wrong. Using a extension doesn't necessarily make the contents unsafe, especially because we break BC anyway and all extension developers will have to read the manual. Just take #568: By simply calling Parsedown::escape() before passing the contents to the highlighter (or when the highlighter already takes care of escaping), one will get a absolutely safe result.

Your intentions are to make it more safe, but in reality you'll make it less safe! By making extensions incompatible with safe mode, extension developers will forcefully disable safe mode - what decreases security, not improves it.

Furthermore, just think about what you've added to Parsedown's README.md:

Safe mode does not necessarily yield safe results when using extensions to Parsedown.

Exactly. Parsedown as a project can only state whether Parsedown is safe, by using a extension the extension's developer is in charge.

side note: use !$unsafeHtml instead of $unsafeHtml !== true 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about that, too. However, the problem is that the assumption behind this is simply wrong. Using a extension doesn't necessarily make the contents unsafe, especially because we break BC anyway and all extension developers will have to read the manual. Just take #568: By simply calling Parsedown::escape() before passing the contents to the highlighter (or when the highlighter already takes care of escaping), one will get a absolutely safe result.

This isn't correct in the general case. The only way for it to be safe is to be aware of the structure of what is being output. What stops the highlighter from having a similar probems to parsedown <=1.6.4 where vulnerabilities can actually be introduced by the parser (even after reading perfectly escaped input).

Your intentions are to make it more safe, but in reality you'll make it less safe! By making extensions incompatible with safe mode, extension developers will forcefully disable safe mode - what decreases security, not improves it.

This is definitely something to consider carefully – but I don't think that there is any way for an extension to be safe unless it is aware of the structure that it is outputting (sans just outputting completely escaped text, which it can do already).

Exactly. Parsedown as a project can only state whether Parsedown is safe, by using a extension the extension's developer is in charge.

This is true, is it not worth making it difficult to make things unsafe though?

side note: use !$unsafeHtml instead of $unsafeHtml !== true 😉

Latter has a preferable fall through if value is truthy but not true ;)

Copy link
Contributor

@PhrozenByte PhrozenByte Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct in the general case. The only way for it to be safe is to be aware of the structure of what is being output. What stops the highlighter from having a similar probems to parsedown <=1.6.4 where vulnerabilities can actually be introduced by the parser (even after reading perfectly escaped input).

What stops Parsedown (or PHP's htmlspecialchars()) from having a bug and doing escaping wrong?

This is definitely something to consider carefully – but I don't think that there is any way for an extension to be safe unless it is aware of the structure that it is outputting (sans just outputting completely escaped text, which it can do already).

The assumption is true in the general case, just the conclusion isn't right. #568 could undoubtfully parse the highlighter's return value into Parsedown's AST structure - however, this would simply make no sense.

This is true, is it not worth making it difficult to make things unsafe though?

You're not making it difficult to make things unsafe, you're doing the exact opposite, you're forcing extension developers to make it unsafe 😉

Latter has a preferable fall through if value is truthy but not true ;)

Sorry, but you'll have to explain under which exact circumstances this can happen here 😉

Copy link
Collaborator Author

@aidantwoods aidantwoods Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What stops Parsedown (or PHP's htmlspecialchars()) from having a bug and doing escaping wrong?

Parsedown is aware of the structure being output, so is htmlspecialchars() (it outputs zero elements). This doesn't make them perfect at doing this, but at least this method has a hope of being safe.

(For example, Parsedown itself would be implementable using only the AST representation to construct elements).

In contrast, if you are blind to the structure being output, then you are necessarily blind to whether it is safe (if you HTML escape the whole thing you now know its structure – zero elements).
The only thing you can do is to trust the thing that gave you this unknown structure, but this only has a hope of being safe if someone is aware of the structure (otherwise how does anyone make any assurances on what is being output?).


I think we might be trying to achieve different things here – this PR was really aimed at allowing HTML without going via the AST provided that both the extension author and the user are aware that the generated HTML requires trust of user input (this is what was actually asked for by #568).

However, I think you've made it apparent that we should actually allow extensions to also make a different claim: assume that input is trusted, but instead "vouch" for the HTML they have generated as being safe.

I think we should perhaps add separate options for this, what do you think?

The former would allow someone to effectively implement things that require trusted user input. For example: Parsedown's feature which allows raw HTML from user input (but escapes it if the user states that input is unsafe).

The latter would allow extension generated HTML to be output unescaped in both regular and safe mode (but would require that the extension author "promise" that it is safe to do this).

@aidantwoods
Copy link
Collaborator Author

I've renamed "unsafeHtml" to "rawHtml" and added an option such that on a per-element basis the safety of said HTML can be declared to have been checked, and thus allow it in safe mode.

(WIP names for both options).

What do you think?

@PhrozenByte
Copy link
Contributor

It's more important that you understand why I have chosen this name

I totally understand what you were trying to achieve by choosing unsafeHtml - and you're right, our diferent interpretations are proof that neither unsafeHtml nor safeHtml are good choices. Anyway, rawHtml is a good choice. 👍

I think we might be trying to achieve different things here

Yeah, this explains it. I rather thought the reason behind the $this->safeMode condition is that you want to take additional precaution for bad third-party developers.

Just to explain my point: I strongly believe in self-responsibility. I don't want to end up in a situation in which things get complicated for me just because someone else though additional precaution is necessary because some bad third-party developers can't read the manual (yes, I understand that this wasn't your intention). Just calling it rawHtml is IMO absolutely sufficient. We don't have legacy code out there using rawHtml (because it is a new feature), thus we can demand that every extension developer reads the manual and understands that he is in charge of enforcing safe mode before using rawHtml. Being in charge also means that the developer must know whether the output he generates might be unsafe. If it might be unsafe, he must evaluate $this->safeMode and act accordingly, either by adding additional escaping or disabling the feature. In brief, I truly believe that we can demand that extension developers know that $this->safeMode exists (not least because we're telling them that it exists...) and demand that they take its existance into consideration.

Thus I believe that allowRawHtmlInSafeMode isn't really necessary. However, by adding allowRawHtmlInSafeMode Parsedown basically predetermines a possible solution when safe mode is enabled - and this causes no harm, because I can still set it to true and take care of it myself. Thus I'm absolutely fine with it. 👍


Btw: I still think that !== true is unnecessary. It is literally impossible that any of these variables ever hold something else than true or false 😉

Lines 1503 to 1517 can be refactored to just two lines:

$allowRawHtmlInSafeMode = isset($Element['allowRawHtmlInSafeMode']) && $Element['allowRawHtmlInSafeMode'];
$permitRawHtml = !$this->safeMode || $allowRawHtmlInSafeMode;

@aidantwoods
Copy link
Collaborator Author

aidantwoods commented Mar 18, 2018

Lines 1503 to 1517 can be refactored to just two lines [...]

Tyty, I've refactored that now :)

Btw: I still think that !== true is unnecessary. It is literally impossible that any of these variables ever hold something else than true or false 😉

This is me trying to protect against type juggling surprises, but I've foregone doing it for now since there are examples of me not doing this even within the same method ;p

At some point I'll try and enumerate the instance variables that might be prone to (implicit) type casting errors and put stronger type checks there – you're quite right about it not really mattering for the locally scoped stuff that we're setting explicitly though ;p

Copy link
Contributor

@PhrozenByte PhrozenByte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 👍

@aidantwoods aidantwoods merged commit 77dc0a0 into erusev:master Mar 18, 2018
@aidantwoods aidantwoods added this to the 1.8.0 milestone Mar 28, 2018
@aidantwoods aidantwoods deleted the feature/unsafe-html branch March 28, 2018 11:45
@aidantwoods aidantwoods changed the title Add unsafeHtml option for extensions to use on trusted input Add rawHtml option for extensions to use on trusted input Mar 28, 2018
@hvt
Copy link

hvt commented Apr 4, 2018

@aidantwoods would be great if the result of this PR could be tagged as a release, I'm currently suffering some double escaping :].

@cossssmin
Copy link

cossssmin commented Apr 4, 2018

^ +1, I'm using Jonas' tutorial (which he described in #568), and I always end up with escaped tags.

So based on that Parsedown::blockFencedCodeComplete() method override, something like this:

<div>Just an example</div>

Results in:

<pre>
  <code class="hljs language-html">
&lt;span class="hljs-tag"&gt;&amp;lt;&lt;span class="hljs-name"&gt;div&lt;/span&gt;&amp;gt;&lt;/span&gt;Just an example&lt;span class="hljs-tag"&gt;&amp;lt;/&lt;span class="hljs-name"&gt;div&lt;/span&gt;&amp;gt;&lt;/span&gt;
  </code>
</pre>

@aidantwoods
Copy link
Collaborator Author

@hvt I'll try to get 1.8.0 out this weekend :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants