diff --git a/proposals/2675-aggregations-server.md b/proposals/2675-aggregations-server.md new file mode 100644 index 0000000000..5808dbec00 --- /dev/null +++ b/proposals/2675-aggregations-server.md @@ -0,0 +1,320 @@ +# MSC2675: Serverside aggregations of message relationships + +It's common to want to send events in Matrix which relate to existing events - +for instance, reactions, edits and even replies/threads. + +Clients typically need to track the related events alongside the original +event they relate to, in order to correctly display them. For instance, +reaction events need to be aggregated together by summing and be shown next to +the event they react to; edits need to be aggregated together by replacing the +original event and subsequent edits, etc. + +It is possible to treat relations as normal events and aggregate them +clientside, but to do so comprehensively could be very resource intensive, as +the client would need to spider all possible events in a room to find +relationships and maintain a correct view. + +Instead, this proposal seeks to solve this problem by defining APIs to let the +server calculate the aggregations on behalf of the client, and so bundle the +aggregated data with the original event where appropriate. It also proposes an +API to let clients paginate through all relations of an event. + +This proposal is one in a series of proposals that defines a mechanism for +events to relate to each other. Together, these proposals replace +[MSC1849](https://github.com/matrix-org/matrix-doc/pull/1849). + +* [MSC2674](https://github.com/matrix-org/matrix-doc/pull/2674) defines a + standard shape for indicating events which relate to other events. +* This proposal defines APIs to let the server calculate the aggregations on + behalf of the client, and so bundle the aggregated data with the original + event where appropriate. +* [MSC2676](https://github.com/matrix-org/matrix-doc/pull/2676) defines how + users can edit messages using this mechanism. +* [MSC2677](https://github.com/matrix-org/matrix-doc/pull/2677) defines how + users can annotate events, such as reacting to events with emoji, using this + mechanism. + +## Proposal + +### Aggregations + +Relation events can be aggregated per relation type by the server. +The format of the aggregated value (hereafter called "aggregation") +depends on the relation type. + +Some relation types might group the aggregations by the `key` property +in the relation and aggregate to an array, while +others might aggregate to a single object or any other value really. + +Here are some non-normative examples of what aggregations can look like: + +Example aggregation for [`m.thread`](https://github.com/matrix-org/matrix-doc/pull/3440) (which +aggregates all relations into a single object): +``` +{ + "latest_event": { + "content": { ... }, + ... + }, + "count": 7, + "current_user_participated": true +} +``` + +Example aggregation for [`m.annotation`](https://github.com/matrix-org/matrix-doc/pull/2677) (which +aggregates relations into a list of objects, grouped by `key`). +``` +[ + { + "key": "👍", + "origin_server_ts": 1562763768320, + "count": 3 + }, + { + "key": "👎", + "origin_server_ts": 1562763768320, + "count": 2 + } +] +``` + +#### Bundling + +Other than during non-gappy incremental syncs, timeline events that have other +events relate to them should include the aggregation of those related events +in the `m.relations` property of their unsigned data. This process is +referred to as "bundling", and the aggregated relations included via +this mechanism are called "bundled aggregations". + +By sending a summary of the relations, bundling +avoids us having to always send lots of individual relation events +to the client. + +Aggregations are never bundled into state events. This is a current +implementation detail that could be revisited later, +rather than a specific design decision. + +The following client-server APIs should bundle aggregations +with events they return: + + - `GET /rooms/{roomId}/messages` + - `GET /rooms/{roomId}/context/{eventId}` + - `GET /rooms/{roomId}/event/{eventId}` + - `GET /sync`, only for room sections in the response where `limited` field + is `true`; this amounts to all rooms in the response if + the `since` request parameter was not passed, also known as an initial sync. + - `GET /relations`, as proposed in this MSC. + +Deprecated APIs like `/initialSync` and `/events/{eventId}` are *not* required +to bundle aggregations. + +The bundled aggregations are grouped according to their relation type. +The format of `m.relations` (here with *non-normative* examples of +the [`m.replace`](https://github.com/matrix-org/matrix-doc/pull/2676) and +[`m.annotation`](https://github.com/matrix-org/matrix-doc/pull/2677) relation +types) is as follows: + +```json +{ + "event_id": "abc", + "unsigned": { + "m.relations": { + "m.annotation": { + "key": "👍", + "origin_server_ts": 1562763768320, + "count": 3 + }, + "m.replace": { + "event_id": "$edit_event_id", + "origin_server_ts": 1562763768320, + "sender": "@alice:localhost" + }, + } + } +} +``` + +#### Client-side aggregation + +Bundled aggregations on an event give a snapshot of what relations were known +at the time the event was received. When relations are received through `/sync`, +clients should locally aggregate (as they might have done already before +supporting this MSC) the relation on top of any bundled aggregation the server +might have sent along previously with the target event, to get an up to date +view of the aggregations for the target event. The aggregation algorithm is the +same as the one described here for the server. + +### Querying relations + +A single event can have lots of associated relations, and we do not want to +overload the client by, for example, including them all bundled with the +related-to event. Instead, we also provide a new `/relations` API in +order to paginate over the relations, which behaves in a similar way to +`/messages`, except using `next_batch` and `prev_batch` names +(in line with `/sync` API). Tokens from `/sync` or `/messages` can be +passed to `/relations` to only get relating events from a section of +the timeline. + +The `/relations` API returns the discrete relation events +associated with an event that the server is aware of +in standard topological order. Note that events may be missing, +see [limitations](#servers-might-not-be-aware-of-all-relations-of-an-event). +You can optionally filter by a given relation type and the event type of the +relating event: + +``` +GET /_matrix/client/v1/rooms/{roomID}/relations/{event_id}[/{rel_type}[/{event_type}]][?from=token][&to=token][&limit=amount] +``` + +``` +{ + "chunk": [ + { + "type": "m.reaction", + "sender": "...", + "content": { + "m.relates_to": { + "rel_type": "m.annotation", + ... + } + } + } + ], + "prev_batch": "some_token", + "next_batch": "some_token", +} +``` + +The endpoint does not have any trailing slashes. It requires authentication +and is not rate-limited. + +The `from` and `limit` query parameters are used for pagination, and work +just like described for the `/messages` endpoint. + +Note that [MSC2676](https://github.com/matrix-org/matrix-doc/pull/2676) +adds the related-to event in `original_event` property of the response. +This way the full history (e.g. also the first, original event) of the event +is obtained without further requests. See that MSC for further details. + +### End to end encryption + +Since the server has to be able to aggregate relation events, structural +information about relations must be visible to the server, and so the +`m.relates_to` field must be included in the plaintext. + +A future MSC may define a method for encrypting certain parts of the +`m.relates_to` field that may contain sensitive information. + +### Redactions + +Redacted relations should not be taken into consideration in +bundled aggregations, nor should they be returned from `/relations`. + +Requesting `/relations` on a redacted event should +still return any existing relation events. +This is in line with other APIs like `/context` and `/messages`. + +### Local echo + +For the best possible user experience, clients should also include unsent +relations into the client-side aggregation. When adding a relation to the send +queue, clients should locally aggregate it into the relations of the target +event, ideally regardless of the target event having received an `event_id` +already or still being pending. If the client gives up on sending the relation +for some reason, the relation should be de-aggregated from the relations of +the target event. If the client offers the user a possibility of manually +retrying to send the relation, it should be re-aggregated when the user does so. + +De-aggregating a relation refers to rerunning the aggregation for a given +target event while not considering the de-aggregated event any more. + +Upon receiving the remote echo for any relations, a client is likely to remove +the pending event from the send queue. Here, it should also de-aggregate the +pending event from the target event's relations, and re-aggregate the received +remote event from `/sync` to make sure the client-side aggregation happens with +the same event data as on the server. + +When adding a redaction for a relation to the send queue, the relation +referred to should be de-aggregated from the relations of the target of the +relation. Similar to a relation, when the sending of the redaction fails or +is cancelled, the relation should be aggregated again. + +If the target event is still pending and hasn't received its `event_id` yet, +clients can locally relate relation events to their target by using +`transaction_id` like they already do for detecting remote echos when sending +events. + +## Edge cases + +### How do you handle ignored users? + + * Information about relations sent from ignored users must never be sent to + the client, either in aggregations or discrete relation events. + This is to let you block someone from harassing you with emoji reactions + (or using edits as a side-channel to harass you). Therefore, it is possible + that different users will see different aggregations (a different last edit, + or a different reaction count) on an event. + +## Limitations + +### Relations can be missed while not being in the room + +Relation events behave no different from other events in terms of room history visibility, +which means that some relations might not be visible to a user while they are not invited +or have not joined the room. This can cause a user to see an incomplete edit history or reaction count +based on discrete relation events upon (re)joining a room. + +Ideally the server would not include these events in aggregations, as it would mean breaking +the room history visibility rules, but this MSC defers addressing this limitation and +specifying the exact server behaviour to [MSC3570](https://github.com/matrix-org/matrix-doc/pull/3570). + +### Servers might not be aware of all relations of an event + +The response of `/relations` might be incomplete because the homeserver +potentially doesn't have the full DAG of the room. The federation API doens't +have an equivalent of the `/relations` API, so has no way but to fetch the +full DAG over federation to assure itself that it is aware of all relations. + +[MSC2836](https://github.com/matrix-org/matrix-doc/pull/2836) provided a proposal for following relationships over federation in the "Querying relationships over federation" section via a `/_matrix/federation/v1/event_relationships` API + +### Event type based aggregation and filtering won't work well in encrypted rooms + +The `/relations` endpoint allows filtering by event type, +which for encrypted rooms will be `m.room.encrypted`, rendering this filtering +less useful for encrypted rooms. Aggregation algorithms that take the type of +the relating events they aggregate into account will suffer from the same +limitation. + +## Future extensions + +### Handling limited (gappy) syncs + +For the special case of a gappy incremental sync, many relations (particularly +reactions) may have occurred during the gap. It would be inefficient to send +each one individually to the client, but it would also be inefficient to send +all possible bundled aggregations to the client. + +The server could tell the client the event IDs of events which +predate the gap which received relations during the gap. This means that the +client could invalidate its copy of those events (if any) and then requery them +(including their bundled relations) from the server if/when needed, +for example using an extension of the `/event` API for batch requests. + +The server could do this with a new `stale_events` field of each room object +in the sync response. The `stale_events` field would list all the event IDs +prior to the gap which had updated relations during the gap. The event IDs +would be grouped by relation type, +and paginated as per the normal Matrix pagination model. + +This was originally part of this MSC but left out to limit the scope +to what is implemented at the time of writing. + +## Prefix + +While this MSC is not considered stable, the endpoints become: + + - `GET /_matrix/client/unstable/rooms/{roomID}/relations/{eventID}[/{relationType}[/{eventType}]]` + +None of the newly introduced identifiers should use a prefix though, as this MSC +tries to document relation support already being used in +the wider matrix ecosystem. \ No newline at end of file