Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC 1501: Room version upgrades #1502

Merged
merged 8 commits into from
Oct 23, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
370 changes: 370 additions & 0 deletions proposals/1501-room-version-upgrades.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,370 @@
# Room version upgrades

## Background

[MSC1425](https://github.com/matrix-org/matrix-doc/issues/1425) introduces a
mechanism for associating a "version" with a room, which allows us to introduce
changes to the mechanics of rooms.

Assuming that a given change is successful, the next challenge is to introduce
it to existing rooms. This proposal introduces a mechanism for doing so.

## Proposal

In short: room upgrades are implemented by creating a new room, shutting down
the old one, and linking between the two.

The mechanics of this are as follows. When Alice upgrades a room, her client
hits a new C-S api:

```
POST /_matrix/client/r0/rooms/{roomId}/upgrade
```
```json
{
"new_version": "2"
anoadragon453 marked this conversation as resolved.
Show resolved Hide resolved
}
```

Response:

```json
{
"replacement_room": "!QtykxKocfsgujksjgd:matrix.org"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 'successor' to match m.room.create's 'predecessor'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. It's worth noting that predecessor is a dict (with room_id and event_id) whereas this is just a room_id, so the similarity is only partial.

I'm not particularly fussed either way. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbkr: I'm taking the lack of response to mean that you don't care that much either, and have implemented this as replacement_room.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did indeed

}
```

When this is called, the server:

* Checks that Alice has permissions to send `m.room.tombstone` state events.

* Creates a replacement room, with an `m.room.create` with a `predecessor` field
which links to the last known event in the old room:

```json
{
"sender": "@alice:somewhere.com",
"type": "m.room.create",
"state_key": "",
"room_id": "!QtykxKocfsgujksjgd:matrix.org",
"content": {
richvdh marked this conversation as resolved.
Show resolved Hide resolved
"version": "2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-MSC note: This is actually supposed to be room_version as per MSC1425. Implementation of upgrades appears to use room_version as well.

"predecessor": {
"room_id": "!cURbaf:matrix.org",
"event_id": "$1235135aksjgdkg:matrix.org"
}
}
}
```

* Replicates PL/privacy/topic/etc events to the new room.

* Moves any local aliases to the new room.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we can't move any remote aliases may be problematic, particularly if the canonical alias is remote.


* Sends an `m.room.tombstone` state event in the old room to tell participants
that it is dead:

```json
{
"sender": "@alice:somewhere.com",
"type": "m.room.tombstone",
"state_key": "",
"room_id": "!cURbaf:matrix.org",
"content": {
"body": "This room has been replaced",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always expected to be this string? can clients influence the description here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, it's up to the server.

"replacement_room": "!QtykxKocfsgujksjgd:matrix.org"
}
}
```

The `body` of the tombstone event is defined by the server (for now, at
least).

* Assuming Alice has the powers to do so, sets the power levels in the old
room to stop people speaking. In practice, this means setting
`events_default` and `invite` to the greater of `50` and `users_default+1`.

Bob's client understands the `m.room.tombstone` event, and:

* Hides the old room in the room list (the room continues to be accessible
via the old room id (permalinks, backlinks from the new room, etc).

* Displays, at the very bottom of the timeline of the old room: "This room
has been upgraded. Click here to follow the conversation to the new room".
The link is simply a permalink to the new room. When Bob opens it, he will
get joined to the new room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth saying that if the new room is also tombstoned, then the client should (may?) automatically keep following the chain until it reaches a room that isn't dead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I guess that is something worth spelling out.


[Note that if Bob is on a version of synapse which doesn't understand room
versions, following the permalink will take him to a room view which churns
for a while and eventually fails. Synapse 0.33.3 should at least give a
sensible error code.]

If it turns out that the replacement room also has a tombstone event, the
client may automatically keep following the chain until it reaches a room
that isn't dead.

* Optional extension: if the user is in both rooms, then the "N unread
messages" banner when scrolled up in the old room could be made to track
messages in the new room (so in practice the user would only ever see the
hiatus between the versions if they scrolled all the way to the beginning
of the new room or the end of the old one.)

Bob's client also understands the `predecessor` field in the `m.room.create`, and:

* At the top of scrollback in the new room, displays: "This room is a
continuation of a previous room. Click here to see the old conversation."
The link is a permalink to the old room.

* Optional extensions might include things like extending room search to
work across two rooms.

### Client changes needed

* Ability for an op to view the current room version and upgrade it (by
hitting `/upgrade`).

* ~~Also the ability for an op to see what versions the servers in the
current room supports (nb via a cap API) and so how many users will get
locked out~~ (This is descoped for now.)

* Display `m.room.tombstone`s as a sticky message at the bottom of the old
room (perhaps replacing the message composer input) as “This room has been
replaced. Please click here to continue” or similar.

* When the user clicks the link, the client attempts to join the new room if
we are not already a member, and then switches to a view on the new room.

The client should supply the name of the server which sent the tombstone
event as the `server_name` for the `/join` request. (That being the most
likely server to have an up-to-date copy of the room - this is essentially
a workaround for [vector-im/riot-web#2925](https://github.com/vector-im/riot-web/issues/2925).)

* If the client sees a pair of rooms with a tombstone correctly joined to the
new room, it should hide the old one from the RoomList.

* If one backpaginates the new room to its creation, we should show the
`m.room.create` as “This room is a continuation of a previous room; click here
to view” (or similar).

* Search is extended to search old rooms (could be done clientside for now).

Future eye-candy:

* When one is viewing the old room:

* Rather than showing a sticky tombstone at the bottom, one should probably
have the “10 unread messages” section in the status bar which refers to
the current version of the room.

* We should probably also show the membership list of the current room. (Perhaps?)

## Future extensions

### Invite-only rooms

Invite-only rooms are not dealt with explicitly here; they are made tricky by
the fact that users in the old room won't be able to join the new room without
an invite.

For now, we are assuming that the main reasons for carrying out a room-version
upgrade (ie, security problems due to state resets and the like) do not apply
as strongly to invite-only rooms, and we have descoped them for now.

In future, we will most likely deal with them as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we somehow deal with it by sending a join event to the new room, and stick a reference to the user's membership event in the old room an the auth_events?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but I'm wary of tying too much auth in the new room to auth in the old (broken, possibly completely insecure) room.


* For local users, we need to first create an invite for each user in the
room. This is easy, if a bit high-overhead.

* For remote users:

* Alice's server could send invites; however this is likely to give an
unsatisfactory UX due to servers being offline, maybe not supporting the
new room version, and then spamming Bob with mysterious invites.

* We could change the auth rules to treat a membership of the old room as
equivalent to an invite to the new room. However, this is likely to
reintroduce the problems we are trying to solve by replacing the room in
the first place.

* We could create a new type of membership which acts like an invite for
the purposes of allowing users into invite-only rooms, but doesn't need
to be sent to remote servers.

### Parting users from the old room

It's not obvious if users should stay members of the old room indefinitely
(until they manually leave). Perhaps they should automatically leave after a
respectful period? What if the user leaves the *new* room?

For now, we'll assume that they stay members until they manually leave. We can
see how that feels in practice.

## Potential issues

* What about clients that don't understand tombstones?

* I think they'll just show you two separate rooms (both with the same
name), and you won't be able to talk in one of them. It's not great but it
will probably do.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we mitigate this by recommending (not mandating) that an m.room.message as well as an m.room.tombstone is sent into the old room (at least until we have fallback events for tombstones) so dumb clients can tell their users what's going on via a plain old IM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. Are you expecting the message to be sent by the client or the server?


* It's a shame that scrollback in the new room will show a load of joins
before you get to the link to the old room.

## Dismissed solutions

### Variations on this proposal

#### Have servers auto-join their users on upgrade

In order to make the upgrade more seamless, it might be good for servers to
automatically join any users that were in the old room to the new room.

In short, when Bob's server receives a tombstone event, it would attempt to
auto-join Bob to the new room, and updates any aliases it may have to point to
the new room.

It's worth noting that the join may not be successful: for example, because
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KitsuneRal wrote:

I perceived "It's worth noting" as some kind of "by the way" consideration rather than a blocking argument - especially with the workaround presented in the end. The current text leaves me under impression that it wouldn't be so bad to combine the primary proposal with auto-joining (that will work on the best-effort basis but the obvious fallback will be the user clicking that link to the new room displayed for the tombstone event and trying to join the room manually). For end users the difference between the primary case and the fall-back would be being routed to the new room at once vs. a click on the link to upgrade the room.

I am not in favour of things that work 80% of the time and need a completely different solution for the remaining 20%: they tend to confuse users and lead to them thinking things are buggy. (Of course, joins are buggy, but I'm reluctant to build more functionality on top of them instead of taking the time to fix that.)

I'm still in favour of leaving auto-joins out for now. Obviously, this behaviour of "try an auto-join, and otherwise fall back to having the user rejoin" could be implemented later.

Bob's server is too old, or because Bob has been blocked from joining the new
room, or because joins are generally flaky. In this case Bob might attempt a
rejoin via his client.

#### Have servers merge old and new rooms together

Instead of expecting clients to interpret tombstone events, servers could merge
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KitsuneRal wrote:

The current text is not quite clear on what becomes the showstopper; now that I think more about it. The sheer amount of duplication in the sync as the number of upgrades increases (basically, it's another batch duplicating some other for each upgrade) is the primary problem, right? Maybe that previous sentence could be added to the dismissed proposal, for clarity.

The showstopper is as per https://github.com/matrix-org/matrix-doc/pull/1502/files#diff-ce167b1ac0f01d8c81d7d9718f396207R218: the server ends up having to do the mapping forever for all rooms and for all users, which is an overhead we don't really want to support.

/sync results together for the two rooms and present both as the old room in
/sync results - and then map from the old room ID back to the new one for API
calls.

(Note that doing this the other way around (mapping both rooms into the *new*
room ID) doesn't really work without massively confusing clients which have
cached data about the old room.)

This sounds pretty confusing though: would the server do this forever
for all users on the server?

#### have clients merge old and new rooms

At the top of scrollback for the new room, shows some sort of "messages from a
previous version of this room" divider, above which it shows messages from the
old room <sup name="a10">[1](#f10)</sup>.

This gets quite messy quite quickly. For instance, it will be confusing to see
any ongoing traffic in the old room (people whose servers didn't get the memo
about the tombstone; people leaving the old room; etc). Ultimately it's all
client polish though, so we can consider this for the future.

<b name="f10">1</b> It's probably worth noting that this alone is a somewhat
fundamental reworking of a bunch of complicated stuff in matrix-react-sdk (and
presumably likewise in the mobile clients). [↩](#a10)

### Counter-proposal: upgrade existing rooms in-place

The general idea here is to divide the room DAG into v1 and v2 parts. Servers
which do not support v2 will not see the v2 part of the DAG. This might look
like the below:

![dag](1501-split-dag.png)

In this model, room version is tracked as a state event:

```json
{
"type": "m.room.version",
"content": {
"version": 2
}
}
```

This event can only ever be sent by the **original room creator** - i.e., the
sender of the `m.room.create event` - even if the `m.room.power_levels` say
otherwise <sup name="a1">[1](#f1)</sup> <sup name="a2">[2](#f2)</sup>.

The DAG is now divided into parts where the state of m.room.version is 2, and
those where it is missing (and therefore implicitly 1).

When sending out federation transactions, v2 events are listed under a new
versioned_pdus key in the /send request:

```
PUT /_matrix/federation/v1/send/991079979

{
"origin_server_ts": 1404835423000,
"origin": "matrix.org",
"pdus": [],
"versioned_pdus": {
2: [...]
},
"edus": [...]
}
```

Old servers which do not support v2 will therefore not receive any v2 parts of
the DAG, and transactions which only contain updates to the v2 part of the DAG
will be treated as no-ops. Servers should ignore entries under versioned_pdus
which correspond to versions they do not know about.

As a special case, `m.room_version events` themselves are treated as having the
version previously in force in the room, unless that was v1, in which case,
they are treated as being v2. This means that the upgrade from v1 to v2, *and*
the upgrade from v2 to v3, are included under `versioned_pdus[2]`, but the
upgrade from v3 to v4 is included under `versioned_pdus[3]`. If a server
receives an `m.room_version` event with a version it does not understand, it
must refuse to allow any other events into the upgraded part of the DAG (and
probably *should* refuse to allow any events into the DAG at all)
<sup name="a1">[3](#f3)</sup>. The reasons for this are as follows:

* We want to ensure that v1-only servers do not receive the room_version
event, since they don't know how to auth it correctly, and more importantly
we want to ensure that they can't get hold of it to construct bits of DAG
which refer to it, and would have to follow the v2 rules.

* However, putting subsequent upgrades in a place where older servers will see
it means that we can do more graceful upgrades in future by telling their
clients that the room seems to have been upgraded.

As normal, we require joining servers to declare their support for room
versions, and reject any which do not include the version of the room given by
the current room state.

In theory this is all that is required to ensure that v1-only servers never see
any v2 parts of the DAG (events are only sent via /send, and a v1 server should
have no way to reference v2 parts of the DAG); however we may want to consider
adding a "versions" parameter to API calls like /event and /state which allow a
server to specify an event_id to fill from, in case a (non-synapse, presumably)
implementation gets an event id from a /context request or similar and tries to
fill from it.

The experience for clients could be improved by awareness of the m.room.version
event and querying a capabilities API (e.g. /versions?) to determine whether a
room has been upgraded newer than the server can support. If so, a client
could say: “Your server must be upgraded to continue to participate in this
room” (and manually prevent the user from trying to speak).

#### Problems

* This leaves users on old v1-only servers in their own view of the room,
potentially continuing to speak but only seeing a subset of events (those
from their own and other v1 servers).

In the worst case, they might ask questions which those on v2 servers can see but have no way of replying to.

One way to mitigate this would be to ensure that the last event sent in the
v1 part of the DAG is an m.room.power_levels to make the room read-only to
casual users; then restore the situation in the v2 DAG.

* There's nothing to stop "inquisitive" users on v1 servers sending
m.room.version events, which is likely to completely split-brain the room.

* This doesn't help us do things like change the format of the room id.

<b name="f1">1</b> This avoids the version event itself being vulnerable to state
resets. [↩](#a1)

<b name="f2">2</b> This is therefore a change to the event authorisation rules
which we need to introduce with room version 2. [↩](#a2)

<b name="f3">3</b> It's worth noting this does give the room creator a
kill-switch for the room, even if they've subsequently been de-opped. Such is
life. [↩](#a3)
Binary file added proposals/1501-split-dag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.