Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4013: Poll history cache #4013

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions proposals/4013-poll-history-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# MSC4013: Poll history cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to ask plain and simple where in this MSC do we dismiss the problem that users do not want metadata leaks like this. Or atleast a large group of quite loud users do not want metadata leaks like this.

Now This concern could be dismissed by the fact that polls already use relations, but atleast these relations cant tell you for sure its a poll you can only tell that the events are related. I personally would argue that its somewhat dubious why we even need relations to not respect encryption considering their current use.

Still do we need to add more metadata leaks? Matrix already gets bad articles/blog posts written about it for metadata leaking and MSCs like this one doesnt help that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encrypting the state event (#3414) would help mitigate this in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encrypted state would mittigate a bit of the concerns but with plain text relations it would still leave a massive tell. as no other encrypted state event would be the target of this many relations, essentially undoing the whole point in encrypting the cache state event.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to ask plain and simple where in this MSC do we dismiss the problem that users do not want metadata leaks like this. Or atleast a large group of quite loud users do not want metadata leaks like this.

This feature should considered optional.
In theory room's creators aren't obliged to send the m.room.poll_history state event for a room.
At the same time clients sending a poll may choose to opt out from this behaviour.
At the same time structures like this are very useful on clients that want navigate content by type (polls in this case) and willing to pay the price of a small metadata leak.

I personally would argue that its somewhat dubious why we even need relations to not respect encryption considering their current use.

I guess the may reason is the server being able to group related events together efficiently. Eg. the same problem is solved in the same way for the "threads" feature.

Encrypted state would mittigate a bit of the concerns but with plain text relations it would still leave a massive tell. as no other encrypted state event would be the target of this many relations, essentially undoing the whole point in encrypting the cache state event.

Yes, probably people could still guess what would be the m.poll_history state event could be in the end, but at least they couldn't be 100% sure I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the state event is optional, the MSC should note that (I don't believe it does currently).

Aside from encrypting the whole state event though, the other option is encrypting the event relationship (like in MSC4019 or similar). I personally don't believe that encrypting the relationship on either end is a blocker for this MSC, however. Ideally we figure out how to encrypt both sides and then use both to protect metadata.

In the meantime, considering polls is quite far away from entering FCP itself, starting implementation unencrypted should be fine (as long as the client is comfortable with the risks). It may need to change in the future as other, adjacent, MSCs make forward progress though.

alfogrillo marked this conversation as resolved.
Show resolved Hide resolved

The purpose of this MSC is to allow clients to quickly fetch all the polls in a given room (aka “Poll history”).

## Problem

Clients sometimes may want to access quickly to all the polls sent in a given room (similarly to how they may want to access all the media being sent in a room).
Today clients don’t have an efficient way to make such an operation. Clients that want to implement this feature for encrypted rooms on the status quo may need to backward paginate on the room’s timeline until all the polls are found or they reach the beginning of the timeline.
For unencrypted rooms the problem doesn't exist since clients can call the [messages API](https://spec.matrix.org/v1.6/client-server-api/#get_matrixclientv3roomsroomidmessages) filtering by event type (e.g. by `m.poll.start`).

## Proposal

Introduce a new state event `m.room.poll_history`. This state event is supposed to be referenced by any `m.poll.start` that will be sent next in the room. The new state event must have an empty string as `state_key`.
More information on polls can be found on the [MSC3381](https://github.com/matrix-org/matrix-spec-proposals/pull/3381).

This is how the new state event would look like:

```json5
{
// irrelevant fields not shown
"content": {},
"state_key": "",
"type": "m.room.poll_history",
"event_id": "poll_history_id"
}

```

### Dependencies
- [MSC3381](https://github.com/matrix-org/matrix-spec-proposals/pull/338)

## Client behaviour
First of all a room creator sends the new `m.room.poll_history` state event in the `initial_state` when calling the [createRoom API](https://spec.matrix.org/v1.6/client-server-api/#post_matrixclientv3createroom).

For example the body when creating a new encrypted room would look like this:

```json5
{
"preset": "private_chat",
"name": "Some room",
"is_direct": false,
"initial_state": [
// ... other state events
{
"stateKey": "",
"type": "m.room.encryption",
"content": {
"algorithm": "m.megolm.v1.aes-sha2"
}
},
{
"stateKey": "",
"type": "m.room.poll_history",
"content": {}
}
]
}
```


Any time a client starts a new poll, it also includes a reference to the above state event id like this:

```json5
{
"type": "m.poll.start",
// irrelevant fields not shown
"content": {
"m.poll.start": {
"kind": "m.poll.disclosed",
"answers": [
{"id": "id_a", "m.text": "Yes"},
{"id": "id_b", "m.text": "No"}
],
"question": { "m.text": "Do you like polls?" },
"max_selections": 1,
},
// reference to the state event of type "m.room.poll_history"
"m.relates_to": {
"event_id": "poll_history_id",
"rel_type": "m.reference"
}
Comment on lines +83 to +87
Copy link

@jonnyandrew jonnyandrew Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be an issue that the poll start event would no longer be a root event and therefore could not be the root of a thread?

From https://spec.matrix.org/v1.8/client-server-api/#threading

Threads [...] reference the thread root (the first event in a thread). It is not possible to create a thread from an event which itself is the child of an event relationship (i.e., one with an m.relates_to property).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like this would preclude a poll from being a reply in a thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like this would preclude a poll from being a reply in a thread.

This is also a potential problem, but I was actually referring to the fact that a thread root should not be related to any other event. What it means is that there is no way to discuss a poll in a thread, which I imagine is a natural use case for threads.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like this would preclude a poll from being a reply in a thread.

Fwiw polls aren't a supported inside threads atm on Element clients.
It doesn't seem to me interesting as a use case.

but I was actually referring to the fact that a thread root should not be related to any other event

Why is that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my understanding from the spec which states

It is not possible to create a thread from an event which itself is the child of an event relationship (i.e., one with an m.relates_to property).

I don't see why it shouldn't be possible though. Perhaps that part of the spec could be updated to accommodate this type of use case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the documentation meant that you cannot create a thread from an event having already a rel_type of type m.thread?
@clokep do you know is the home server has checks that may reject this events (events starting a thread from an event with a rel_type equal to m.reference)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the documentation meant that you cannot create a thread from an event having already a rel_type of type m.thread?

The documentation is correct AFAIK.

@clokep do you know is the home server has checks that may reject this events (events starting a thread from an event with a rel_type equal to m.reference)?

They will get rejected: https://github.com/matrix-org/synapse/blob/928e9648578d24dc9f5ed3476d629fca9a64b22a/synapse/handlers/message.py#L1382-L1386

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will get rejected: https://github.com/matrix-org/synapse/blob/928e9648578d24dc9f5ed3476d629fca9a64b22a/synapse/handlers/message.py#L1382-L1386

Looking into the code the rejection seems to happens just if the relation type is m.thread?
If this is the case, polls having a m.reference rel_type can be still be used as root for a thread?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into the code the rejection seems to happens just if the relation type is m.thread?

If the new event's relation is a thread then it checks that the parent event does not have any relations. I think this situation would be rejected, but I did not check it manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for clarifying

}
}
```

Clients can now use the event relationship to fetch the history of polls in the room:

1. They call the [relations API](https://spec.matrix.org/v1.6/client-server-api/#get_matrixclientv1roomsroomidrelationseventid) with the new state event’s event identifier (`poll_history_id` in the example). Since the API takes a timeline direction and a pagination token, clients still have the flexibility to decide how many polls they want to fetch in a given room and in which direction.

The request would look like this:

`/_matrix/client/v1/rooms/{Room ID}/relations/poll_history_id/m.reference?from=from_token&dir=b&limit=100`

The [response](https://spec.matrix.org/v1.6/client-server-api/#server-side-aggregation-of-mreference) will contain an array of related event identifiers. In encrypted rooms these events have likely the type `m.room.encrypted`. After the decryption clients should keep just decrypted events of type `m.poll.start`.
4. For each event `id_some` kept from the previous step, clients need to make the poll aggregation either by fetching data from a local database (if available) or by calling again the relations API again with the `id_some` event id. At this point clients have all the information they need to build the full poll history.

## Potential issues

### History on already existing rooms
It’s desirable to have the new `m.room.poll_history` state as a part of the `initial_state` of a room, but sometimes people may want to have a similar behaviour on already existing rooms. In this case a user with enough power level can just publish a `m.room.poll_history` event in the room. It is worth noticing that in this cases `m.poll.start` events sent before wouldn't have any relationship with the state event.

### Rooms with both old and new clients
Clients understanding the new `m.room.poll_history` state event should still not fetch the poll history as described above if the `m.room.poll_history` is missing in a room. It's still possible however to have old and new clients in a room supporting the poll history. In this case new clients wouldn't see new polls opened by old clients in the poll history. This problem doesn't affect the room's timeline.

### Privacy
Even in encrypted rooms references to other events (key `m.relates_to`) are clear text. With this proposal, starting a poll in an encrypted room means sending an event of type `m.room.encrypted` having a reference to the state event id of type `m.room.poll_history`. Since state events are also clear text, people may infer that the actual content of the encrypted message is actually a started poll (although its content is still encrypted).

## Power levels considerations
The new `m.room.poll_history` event isn’t supposed to change over time. For this reason the power level required to change the `m.room.poll_history` event should be as high as the the one required for changing the state event `m.room.power_levels` (or similar).

## Possible extensions

The problem this MSC is trying to fix here is to build an index for events of a given type (`m.poll.start` in this case). In theory this approach can be useful to group other events together (e.g. images) with the purpose to show them together on clients. To fix this problem we can think of a more generic state event type `m.room.history` and use the `state_key` to differentiate several types of events.

For polls the state event would look like this:

```json5
{
// irrelevant fields not shown
"content": {},
"state_key": "m.poll.start",
"type": "m.room.history",
"event_id": "poll_history_id"
}

```

## Alternatives considered

Copy link
Member

@uhoreg uhoreg May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like using /search with "types": ["m.room.poll_history"] would be an alternative.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean an alternative to the /state?
If yes, does it make any difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that, rather than creating a new state event and searching for relations on that state event, why not just use the /search endpoint to search for polls?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because search doesn't work on encrypted room. Poll events are actually sent as m.room.encrypted. 😕
Fwiw in the main proposal just one state event is required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good reason. It might be worth mentioning that in the MSC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unencrypted room /messages with a filter on event type would be better than /search, FWIW. But again doesn't work with encrypted events.

### State event for each poll

An alternative can be to have multiple instances of the same state event `m.room.poll_history` but with different `state_key`s. In this case the client opening a poll is also required to send a state event with the `state_key` equal to `@sender:somewhere.org_XYZ`. The string `XYZ` should be a unique token for the poll. The perfect candidate is the event id of the `m.poll.start` event.
The poll history can later be built by fetching all the state events with type `m.room.poll_history` by calling the [state API](https://spec.matrix.org/v1.6/client-server-api/#get_matrixclientv3roomsroomidstateeventtypestatekey). This is possible since here the state event also contains the event id of the `m.poll.start` event:

```json5
{
// irrelevant fields not shown
"content": {
// the id of the `m.poll.start` event
"poll_id": "poll started event id"
},
// Potentially XYZ == the id of the `m.poll.start` event
"state_key": "@someone:somewhere.org_XYZ",
"type": "m.room.poll_history",
"event_id": "poll_history_id"
}

```

Potential problems with this approach however are:
- Users opening a poll need enough power level to send a state event (`m.room.poll_history`)
- This approach has an additional dependency: [MSC3757](https://github.com/matrix-org/matrix-spec-proposals/blob/andybalaam/owner-state-events/proposals/3757-restricting-who-can-overwrite-a-state-event.md)

### State event containing an array of poll started events

The idea here is to introduce new state event as the main proposal. The only difference is that here clients are supposed to update the state event every time they open a new poll. The new state event's purpose is to contain all the ids for `m.poll.start` events.

The new state event would look like this:
```json5
{
// irrelevant fields not shown
"content": {
"polls": [
"poll_started_a_id",
"poll_started_b_id",

]
},
"state_key": "",
"type": "m.room.poll_history",
"event_id": "poll_history_id"
}

```

Potential problems with this approach are:
- Users opening a poll need enough power level to send a state event (`m.room.poll_history`)
- Conflicts may arise when two users attempt to change the state event at the same time
- A malicious user or a bug in a client may accidentally erase the history sending a wrong payload

# Unstable prefix

While this MSC is not considered stable, implementations should use `org.matrix.msc4013.*` as a prefix in place of `m.*` for the new state event type.