-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC2326: Label based filtering #2326
base: old_master
Are you sure you want to change the base?
Conversation
Seems generally sane to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this idea and it will really help with things like Zulip bridging. My concerns are mostly around privacy/metadata leaks but I think we have a sane solution in the comments.
Concerns should have been addressed.
Some more potential problems:
|
Hey @dkasak, thanks for your feedback!
FWIW crawling the entire room history isn't always a realistic expectation to have from clients, especially in big and old rooms with thousands of messages and hundreds of servers.
Although I agree with your point, and like your solution, the whole point of using hashes instead of e.g. random opaque strings is to ensure that two clients can only use the same identifier for the same label text. Bearing in mind that we need to be able to use the label identifiers to filter the history of the room server-side (because we're not expecting clients to know about the whole history of the room, see my first point above), opaque strings had the following downsides, all originating from the fact that nothing would prevent 1000 clients from using each a different identifier:
Although what your proposing isn't using a random opaque string, it would still end up having different identifiers for a same label, which would end up causing the issues with filtering I've already mentioned. This I realise that this explanation belongs to an "Alternative solutions" section of the MSC which I've actually forgotten to add, I'll update the proposal with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes about the hash
This was indeed the part I was missing, but I'm less clear on why this ability I suppose the intent is to enable users wanting to talk about (e.g.) cats to In essence, continuing an old thread seems primarily useful when you have the I suppose the desired scenario might be to make labelling a message with My main concern is that just hashing the labels has a rather large downside of [*]: Which makes me think of another problem: is |
On second thought, there's probably no problem with any character here since there's no parsing involved. |
All of this is a result of the fact that we can't realistically expect a client to know about the entire history of every room it's in (which is why we're doing the filtering on the server side btw), rather than the solution described in this MSC.
The MSC aims at allowing both scenarios rather than one to the detriment of the other. We want both to allow users to continue a thread à la Slack if their client can access one of the messages in the thread, and to be able to show conversation topics à la Zulip alongside messages in the timeline. To our (that would be mine and @Half-Shot's) knowledge, the solution that's currently described is the only one that allows both to coexist, because we can't rely on the client to know about every event, thus every possible label in a room. If anyone has an idea for a better solution that allows for both scenarios, I'd love to hear about it.
Again, that is not an issue with this MSC, and this MSC doesn't aim to solve it.
My take on this is that the current solution makes it as hard as possible for rogue servers to figure out the labels on an encrypted event while allowing the features described to work. I agree on the fact that it's not a perfectly secure solution, and again, if anyone knows about a better one I'd be more than happy to hear about it.
The labels are not public information in E2EE rooms, which is the only scenario in which hashing happens (and is also the reason hashing happens). |
(please move to threads so the conversation can actually be followed) |
I realized I haven't said so earlier, so just to be sure: I'm not advocating for this MSC to gain the ability to list all labels.
Yes, sorry, my wording was not the best. What I meant is that the server will be able to reverse the hashes (and hence read the labels) easily in a very large number of cases, if it wants to. I wouldn't had considered this to be the case for E2EE rooms intuitively, had I not seen this MSC. That is, unless people start naming their threads with peppers embedded in them. ;) (also, sorry, I didn't understand Travis' comment; is there a Github threading feature I'm not aware of or was that tongue-in-cheek?) |
Line comments on the diff rather than using the comment box here are threads. Trying to track down what people are talking about here is way too difficult, which is why we require people to use threads. |
|
||
Therefore, this is a simpler proposal to allow messages in a room to be filtered | ||
based on a given label in order to give basic one-layer-deep threading | ||
functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, the 'reply to an existing message' is an important use case.
A message cannot be replied to unless it is already labelled. Regular users cannot add labels to messages they did not author. Should a (unique) label to otherwise unlabelled messages be required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that like replying to the message based on message ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1849 might be better suited for this kind of thing since a client could directly request the message that is referenced... Not sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptman I was thinking subsequent messages could reuse the existing label. Basically avoid nested threads.
AFAICT #1849 proposes a strictly one-way relation (i.e. child points to parent). Wouldn't that make it expensive to list the thread starting from a particular message? Each subsequent message would have to be determined by reverse look-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer hierarchical threads, but either way, regardless of the how the relationships are recorded, they can be shown flat, just like apple mail and gmail do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I was under the impression that this proposal specifically aims for flat threads for the sake of simplicity.
This is done by new `labels` and `not_labels` fields to the `EventFilter` | ||
object, which specifies a list of labels to include or exclude in the given | ||
filter. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be something for another MSC down the road, but would it make sense to have power level restrictions for labels? For example, I may want to have an announcements label in my room that only bots and I can post in. For example, a GitHub status label could be useful to filter out announcement spam from the main room conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting thought, but is probably out of the scope of this MSC imho
|
||
When a user wants to filter a room to given label(s), it defines a filter for | ||
use with `/sync`, `/context`, `/search` or `/messages` to limit appropriately. | ||
This is done by new `labels` and `not_labels` fields to the `EventFilter` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this doesn't seem to make it possible to expose to the user which labels are present in a room (since the client would have to scan all of the events in a room for labels). IMO, it would be poor UX to not have available labels visible. I suppose there could be two ways of doing this:
- Have the server scan each event for labels and keep track of them
- Add a state event (ex,
m.labels
) where the state key is set to a label and the content could contain attributes like description or label color (if desired for UX). IMO this would be the cleanest option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a proposal about an endpoint to get the list of labels available in a given room here: #2326 (comment), but haven't gotten around to add it to the proposal yet.
maintaining one rainbow table for each encrypted room it's in, which would | ||
probably make it not worth the trouble. | ||
|
||
## Alternative solutions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just rereading this proposal, I think @dkasak's points on the main thread are legitimate: that hashing the labels give a very false sense of security here. Given how strong our e2ee is, folks will assume opaque labels are actually encrypted, rather than just obfuscated by a hash which can be easily rainbow-tabled.
Personally, I think it'd be fine to add a pepper to the hashed events, and require at the application level that for labels to work in encrypted rooms, the new user must be brought up to speed on the pepper (e.g. by the inviter sharing the pepper in an encrypted message, possibly to-device, after having invited them).
This is simpler than using opaque IDs for the unencrypted event headers, as there's only one pepper that needs to be shared to new users, rather than the whole set of opaque->real label mappings.
@mscbot concern hashing the labels in encrypted rooms provides a very false sense of security. i think we should add a pepper to the hash, which can be sent to new users in an encrypted msg when they're invited to the room (if the new users need to know about the existing labels). |
This seems to have lots of outstanding concerns, and is generally looking a bit unloved, so I don't think it's ready for FCP generally. @mscbot fcp cancel |
Rendered