Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2801: Make it explicit that event bodies are untrusted data #2801

Merged
merged 4 commits into from
Mar 28, 2021
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions proposals/2801-untrusted-event-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# MSC2801: Make it explicit that event bodies are untrusted data

As the Matrix Specification stands today, it is easy for client developers to
assume that all events will be structured as documented. For example, the
`displayname` key of an [`m.room.member`
event](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-member) is
specified to be "`string` or `null`"; and the `info` key of an [`m.image`
message event](https://matrix.org/docs/spec/client_server/r0.6.1#m-image) must
be a Javascript object.

In reality, these are not safe assumptions. This MSC proposes that the
specification be updated to make it clear that developers should treat all
event data as untrusted.

## Reasons why events may not match the specification

Firstly, let's examine the reasons why such malformed events can exist. Some of
these reasons may appear to have trivial solutions; these are discussed below.

1. Most obviously, Synapse as it currently stands does very little validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably the reason for Synapse being listed here is because it's the historical reference for the spec, though these days there are other implementations which do much stronger validation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synapse is listed here because its current behaviour is the lowest common denominator: as long as there exist servers which exhibit Synapse's current behaviour (as there will be unless we somehow spec otherwise), clients and users must handle that behaviour.

of event bodies sent over the Client-Server API. There is nothing stopping
any user sending a malformed `m.room.message` event with a simple `curl`
command or Element-Web's `/devtools`.

Any event sent in this way will be returned to the clients of all the other
users in the room.

2. Events may be encrypted. Any client implementing E2EE must be prepared to
deal with any encrypted content, since by definition a server cannot
validate it.

3. In order to allow the Matrix protocol to be extensible, server
implementations must tolerate unknown event types, and allow them to be
passed between clients. It is obvious that a pair of custom clients
implementing a `com.example.special.event` event type cannot rely on a
standard server implementation to do any validation for them.

However, this problem extends to event types and keys which have been added
to the specification. For example, the [`m.room.encrypted` event
type](https://matrix.org/docs/spec/client_server/r0.6.1#m-room-encrypted)
was added in Client-Server API r0.4.0. It therefore follows that a server
implementing CS-API r0.3.0 would have no way to validate an
`m.room.encrypted` event, so if a client is connected to such a server, it
could receive malformed events.

4. To extend from point 3, the problem is not even resolved by upgrading the
server. There may now be rooms which contain historical events which would
no longer be accepted, but these will still be served by the server.

This problem also applies to non-room data such as account data. For
example, Client-Server API r0.6.0 added the [`m.identity_server` account
data event
type](https://matrix.org/docs/spec/client_server/r0.6.1#m-identity-server).
It is possible, if unlikely, that a client could have uploaded an
`m.identity_server` event before the administrator upgraded the server.

5. Event redaction removes certain keys from an event. This is a bit of a
trivial case, though it is worth noting that the rules for event redaction
vary between room versions, so it's possible to see a variety of "partial"
events.

6. All the cases above can occur without federation. Federation adds
additional complexities due to the structure of Matrix rooms. In
particular, a server implementation cannot simply ignore any malformed
events since such events may either be critical to the structure of the
room (for example, they may be `m.room.membership` events), or at the very
least ignoring them would leave "holes" in the event graph which would
prevent correct back-pagination.

## Ideas for mitigating the problem

The problems above appear to have some easy solutions. Let's describe some of
those ideas, before considering the fundamental problem that none of them can
solve.

### Validate all events when they are submitted

In this idea, we would require all server implementations to strictly validate
any data which is sent over the Client-Server API, to ensure that it complied
with the specified formats.

This evidently solves problem 1 above, in that it would prevent local users from
creating malformed events of any event types that the server supports; however,
it would do nothing to address any of the other problems.

### Validate events at the point they are served to a client

We could require that server implementations validate any data that they are
about to serve to a client; we might recommend that malformed data be stripped
out of the response, or redacted, or similar.

It is worth mentioning that this would be tricky to implement efficiently on
the server side, but it at least helps address most of the problems above, such
as historical data, or malformed events received over federation.

### Have servers re-validate data on upgrade

Similar to the previous idea, but rather than validating data each time it is
served to a client, any stored data could be re-validated to check that it
complies with new validation requirements.

This could be more efficient in the case that changes to validation rules are
rare, but it could still be a huge amount of data processing on a large server.

### Create new room versions which require events to be well-formed

As outlined above, one of the big problems in this area is how we deal with
events sent over federation; in particular, if subsets of the servers in a room
have different ideas as to which events are "valid", then their concepts of the
room state can begin to drift, and the room can eventually become
"split-brained". This makes it hard to simply say, for example,
"`m.room.member` events with a non-string `displayname` are invald and should
richvdh marked this conversation as resolved.
Show resolved Hide resolved
not form part of the room state": we have a risk that some servers will accept
the event, and some will not.

One approach to solving this is via [room
versions](https://matrix.org/docs/spec/index#room-versions). By specifying that
a change of rules only applies for a future room version, we can eliminate this
potential disagreement.

The process of changing a room from one version to another is intrusive, not
least because it requires that all servers in a room support the new room
version (or risk being locked out). For that reason, it is extremely
undesirable that any new feature require a new room version: whenever possible,
it should be possible to use new features in existing rooms. It therefore
follows that we cannot rely on room versions to provide validation of event
data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another alternative that only requires auth-changing events such as redaction, PL change, role change, kick and ban to be well formed with no extensions, and all others be untrusted. Have you considered that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the sort of thing that is meant by

It may be possible to assert that specific, known cases can be treated as
trusted data, but these should be called out as specific cases.

(https://github.com/matrix-org/matrix-doc/pull/2801/files/5db56391f30621db3a7e36824417fc57ed5f169f#diff-db3e334e6547f917a79147c0c0df60709ac25caca1ae5aaf6822482f0e04d553R188-R189)

In short, yes we have considered it, but it doesn't change the general philosophy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Create a single new room version which applies all event validation rules

This idea is included for completeness, though it is unclear how it would work
in practice.

It has been suggested that we create a new room version which explicitly states
that events which fail the current event schema, whatever that is at that
moment in time, should be rejected.

Let's imagine that in future, the `m.room.member` event schema is extended to
include an optional `location` key, which, if given, must be a string. The
implication of this idea is that servers should reject any `m.room.member`
event whose `location` is not a string. We now have a problem: any servers in
the room which are updated to the latest spec will reject such malformed
events, but any other servers yet to be upgraded will allow it. So is that user
in the room or not?

Even if all the servers in the room can be upgraded at once, what about any
`m.room.member` events which were sent before the rule change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the infeasiblity of alternative of requiring a new room version every time event specifications change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this covered under the previous heading? It's not really relevant to this heading.

## The fundamental problem
richvdh marked this conversation as resolved.
Show resolved Hide resolved

The ideas above all mitigate the problems discussed earlier to a greater or
lesser extent, and may indeed be worth doing on their own merits. However, none
of them can address the problem of outdated server implementations.

For example, consider the case of a new key being added to an event body, say
`m.relates_to`. Now, we may have decided as above that all new specced keys
must be validated by the server, so `vN+1` of Synapse dutifully implements such
validation and refuses to accept events with a malformed `m.relates_to`.

The problem comes for users whose server is still Synapse `vN`. It knows
nothing of `m.relates_to`, so accepts and passes it through even if
malformed. The only potential solution is for clients seeking to implement
`m.relates_to` to refuse to talk to servers which do not declare support for
it.

However, this is an entirely client-side feature: it is illogical to require
that servers must be upgraded before it can be used. Consider that the hosted
element-web at `https://app.element.io` is upgraded to support the new feature;
in this scenario, that would lock out any user whose homeserver had not yet
been upgraded. This is not an acceptable user experience.

In short, we are left with the reality that clients must still handle the
unvalidated data.

## Conclusions

Short of closely coupling server and client versions - which violates the
fundamental ethos of the Matrix project - there is nothing that can completely
prevent clients from having to handle untrusted data. In addition, encrypted
events eliminate any possibility of server-side validation.

With that in mind, the advantages of the ideas above are diminished. If clients
must handle untrusted data in some circumstances, why not in all? "You can
trust the content of this data structure, provided you have checked that the
server knows how to validate it, in which case you need to treat it as
untrusted" is not a useful message for a client developer.

It may be possible to assert that specific, known cases can be treated as
trusted data, but these should be called out as specific cases. The default
message should be that clients must treat all event data as untrusted.