Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Redact all events older than a certain time #1730

Closed
rubo77 opened this issue Dec 30, 2016 · 46 comments
Closed

Redact all events older than a certain time #1730

rubo77 opened this issue Dec 30, 2016 · 46 comments

Comments

@rubo77
Copy link
Contributor

rubo77 commented Dec 30, 2016

We could add this to the prune API. So additionally, when you prune a room, you can also redact all those events, so all content is removed on federated rooms too

@Half-Shot
Copy link
Collaborator

Half-Shot commented Dec 30, 2016

I'm not happy with this. Pruning is intended to be used to save space, not to erase history. Redaction and pruning are entirely different things. By pruning, you are saving space on your server, not everyones.

I would consider it counterproductive to the history-first nature of Matrix to start allowing mass history removal across servers.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 30, 2016

It depends on the use case.

The history-first directive isn't always what's needed.

Maybe make it configurable

@Half-Shot
Copy link
Collaborator

I want to be clear that what your asking is also against the use case of the Prune API.
It's sole purpose is to free up space on the host. It is not intended for redaction. Redaction is a different concept designed for removing events across servers.

What you are asking for is a separate API to remove events in bulk, as a redaction. I don't see this happening due to my reasons given before.

TL;DR - The prune API is a local admin api to free space, not to delete history.

@rubo77 rubo77 changed the title Prune API: redact all events before pruning Redact all events oder than a certain time Dec 31, 2016
@rubo77 rubo77 changed the title Redact all events oder than a certain time Redact all events older than a certain time Dec 31, 2016
@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

I clarified the title and the initial message, so it is clear what I aim for.

There are several usecases, where it is desired to clear old messages for the sake of data avoidance and data parsimony

This was requested before already here: #1480

@kythyria
Copy link

If you want to delete old messages from your server, that's fine. You have exactly no right--and no ability to enforce--that I also delete them.

@Half-Shot
Copy link
Collaborator

Half-Shot commented Dec 31, 2016

I don't understand, what are you hoping to achieve by redacting over pruning. The only difference is that everyone's server gets affected v.s. yours. Redaction will leave some data, and as Erik explained here, you will still lose more or less the same amount of information in both cases. On top of the fact that deleting events is literally impossible to do entirely due to Matrix's design, so signatures are always left.

The only pro I can see to your argument is that by redacting, people can't paginate to get it back which causes issues for people like me, who want to retain all my history where I can.

TL;DR

If you want to delete old messages from your server, that's fine. You have exactly no right--and no ability to enforce--that I also delete them.

EDIT: I would like to clear up that #1480 is a bug predating pruning which became it later on.

@kythyria
Copy link

You can't unsend email, or unsend paper mail, or unsay things in general.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

that is true, you cannot unsay it, but you could globally flag it as redacted

@kythyria
Copy link

You can demand people treat it as unsaid, but you can't actually unsay it.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

yes, that is what I meant.

And additionally you could prevent an email from being sent, if it is still in your outbox. Meaning in Matrix: you could prevent a redacted message from being federated to other servers, if there are no other servers connected with your room yet. This actually is desired in private rooms a lot by people that are trying to avoid excessive data collection in the internet

@kythyria
Copy link

And additionally you could prevent an email from being sent, if it is still in your outbox.

Well, yes. Things that haven't happened yet are generally easy to undo. However, as soon as that message, matrix or email, touches another server, you've lost control of it. Period. No take-backs. So you have to run that bulk redact before anyone from another server enters the room. After that point the window for redaction is, uh, tiny.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

I am perfectly fine with that.

What I aim for here is, that there will be an option to automatically redact all messages older than a certain timeframe, not deleted from the database (thats what #1621 is about) but redacted as it already works right now.

So if that option is turned on, all old messages are not shown in the clients (Riot) any more (although they theoretically still exist as "redacted" in the database) so normal users cannot scroll back in history more than this time.

If the room is federated, this redaction-flag should be federated too, so the admin has full control over the history of the room

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

A script to redact the history would start like this:

#!/bin/bash

# this script will purge all messages of a given room older than a definable age

DOMAIN=yourserver.tld
# add this user as admin in your home server:
ADMIN="@username:$DOMAIN"

#choose the room to prune old messages from
ROOM='!cURbafjkfsMDVwdRDQ:matrix.org' # for example: "Matrix HQ"

# choose a time before which the messages should be pruned:
# TIME='2016-08-31 23:59:59'
TIME='3 months ago'

# creates a timestamp from the given time string:
UNIX_TIMESTAMP=$(date +%s%3N --date='TZ="UTC+2" '"$TIME")

BUSY="pragma busy_timeout=20000"
BUFFER=$(sqlite3 homeserver.db "$BUSY;select event_id from events where type='m.room.message' and received_ts<'$UNIX_TIMESTAMP' and room_id='$ROOM' order by received_ts;")

for line in $BUFFER; do
  # use the api to redact those events
  # ...

@kythyria
Copy link

If the room is federated, this redaction-flag should be federated too, so the admin has full control over the history of the room

And then you hit a server run by someone like me that's been patched to ignore the flag, or never implemented it to begin with. Oops.

The admin does not, and cannot, have that kind of control. This is a fundamental property of any distributed system whose nodes are owned by unrelated entities. Imagine going up to Google and demanding they remove from the inboxes of their users every message older than X days. That's basically what you're asking for here.

And the point of redact is that stuff is deleted from the database leaving only a tombstone whose sole function is to prevent the room from becoming broken.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

I don't understand what is the problem with flagging a message as "redacted by ...."? And Yes, every federated server can choose how to handle that flag, which is fine.

@kythyria
Copy link

The problem isn't a "redacted by" thing, it's that having an auto-redact state entry creates a false sense of security, and an even falser sense of control.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

The "sense of security" wouldn't be false, if the history length would be visible in the head of the room.

look at Telegram: there are rooms that delete everything after some minutes and this is very visible to the user.

And moderated rooms are a fine option in chat systems like Slack and Matrix. There only has to be a fine configuration option who is allowed to delete messages or if it is not allowed.

And it has to be transparent

@4nd3r
Copy link
Contributor

4nd3r commented Dec 31, 2016

please see my comment here: #1621 (comment)

@kythyria
Copy link

kythyria commented Dec 31, 2016

The "sense of security" wouldn't be false, if the history length would be visible in the head of the room.

It would be entirely a lie if any server in the room ignored the history length. Which they will. So the only non-wishful history length it would be valid to display is "messages might be retained forever".

To put it another way, I can put Delete-after: 2d in my emails, and write a client that advertises the option to set that header, but that means absolutely nothing if your mail system doesn't honour it. It just means people will incorrectly think the messages will self-destruct.

Telegram can do this because it's a closed system where one party controls all the servers, and using a third-party client is difficult. Neither of those applies to Matrix, except in a strictly non-federated context.

@rubo77
Copy link
Contributor Author

rubo77 commented Dec 31, 2016

#1621 (comment) by @kythyria

If and only if the room is completely unfederated, and the server honours the relevant messages, will redaction do what @rubo77 seems to think it does.

So is this all true?

  1. redacting does "flag" a message in the database, so it should not be shown in clients (but still could be shown anyway)
  2. this "flag" is federated to other servers too
  3. If every client would obey not to show the redacted messages any more, they would be visible nowhere anymore (Riot does obey this)
  4. If clients don't obey, they can still show the content of messages that were redacted
  5. If a server is not federated to other servers a complete deletion of the content of a message could be a possibility to implement in the future

@kfatehi
Copy link
Contributor

kfatehi commented Apr 18, 2017

I also would like the ability to clear history in a room on my homeserver.

In the interim I just run these three SQL queries on my homserver.db...

delete from events where room_id = "...";
delete from event_json where room_id = "...";
delete from event_push_actions where room_id = "...";

If I were to go a step further I would parse each event type, and if it's a media message, go and delete the appropriate resources from the content repository, and then expose this feature in the UI to admins. But for now this + clear caches in riot is sufficient for my needs.

@rubo77
Copy link
Contributor Author

rubo77 commented Apr 18, 2017

@kfatehi i think you are causing havoc on your database like this. There are a lot more tables affected and the federation completely breaks if you Yost delete stuff directly.

Please use the implemented prune functionality for this

@kfatehi
Copy link
Contributor

kfatehi commented Apr 18, 2017

@rubo77 Thanks for the comment. I am not familiar with prune -- reading the thread above it sounds like it doesn't actually delete messages, and for that I'd have to redact. A script that goes through and redacts everything might be good, but I'm not sure how effective redaction is in a situation like seizure of a homeserver. I'd need to audit these mechanisms and find out for sure.

Had this room been anything but a private direct-chat without federation, I'd have been more cautious!

Keeping an eye on element-hq/element-web#3104 -- thanks for creating these.

@rubo77
Copy link
Contributor Author

rubo77 commented Apr 19, 2017

There is a purge Feature that really deletes the messages #911

This request was another idea instead of Prune,redact

@ghost
Copy link

ghost commented Sep 20, 2017

@rubo77

On top of the fact that deleting events is literally impossible to do entirely due to Matrix's design, so signatures are always left.

Care to elaborate on this?

@rubo77
Copy link
Contributor Author

rubo77 commented Sep 20, 2017

The problem is following: In some Rooms, there is just the need of the history being deleted after a certain time. Since really deleting the messages is not possible if the room is federated, because you can only delete it in your homeserver and it will be federated back to live from other homeservers.

The only solution at the moment is to redact all old posts, which will be federated then. (I am aware, that some homeservers could be modified to not to obey the redact flag, but the solution would be "best effort")

It would be easy to create a script, that redacts all posts older than a certain time, so this would be a nice feature, if it would exist directly in the room configuration.

Such an option should be completely transparent to all members, so you see, that if you write something in that room, it will only last for that long.

@ghost
Copy link

ghost commented Sep 22, 2017

@rubo77 What about "signatures are always left"? I don't understand this part.

@rubo77
Copy link
Contributor Author

rubo77 commented Sep 22, 2017

@Half-Shot said:

signatures are always left.

I can only guess what he meant: if you redact messages, there is a rest left in the database, for example the dateof the posts, and who posted it but "signature" is not the correct term for these "relics"

@ghost
Copy link

ghost commented Sep 25, 2017

@rubo77 so metadata?

I don't think metadata should be left on servers forever, that's a privacy nightmare and there's no reason for that.

@MurzNN
Copy link

MurzNN commented Sep 26, 2017

Can we after redacting - keep on servers only signature, without metadata (message text content, etc)? As I understand, via signature server validate message content, but if message is redacted - can we skip validation and accept cleaned up message with 'redact' flag and kept signature?

@kythyria
Copy link

Redacted messages contain a copy of the redaction message, the id, timestamp, and sender, as far as I can tell. The content is gone (this is all assuming that redaction is correctly implemented, which of course there are zero guarantees about).

The signature validation is designed so that this works (and the redaction message isn't part of the signature, nor could it be). Matrix relies on the signatures chaining together in order for a room to stay coherent, so there needs to be enough for the validation to work.

@MurzNN
Copy link

MurzNN commented Sep 29, 2017

@kythyria thanks for the description!

I want to describe more about privacy problem with deletion - Matrix developers very often warns about privacy issues on feature requests about delete room & messages: with federation we can't control other servers and can't be sure that they remove messages & rooms, so they don't want implement deletion (full remove room, self-destruct messages, etc) in Matrix protocol. But most of rooms in server usually not federated and can be succesfully cleaned up on one homeserver with full guarantees. But users miss this feature, even if room is not federated.

So good way on deletion process will be check if room is federated, and show "large red warning" on client side, when user try to cleanup something, and describe that data of this room is removed only on this homeserver, and can be kept on other federated servers. And add per-room option "Disable federation".

This is better that ignoring all delete feature requests from users with "this is insecure so will not implemented".

@4nd3r
Copy link
Contributor

4nd3r commented Sep 29, 2017

@MurzNN what you describe is basically what i described here: #1621 (comment)

@rubo77
Copy link
Contributor Author

rubo77 commented Sep 29, 2017

Yes, Great conclusion! So please include this behaviour someone.

What can we do to help accelerate the development in this direction, so we get these options?

@MurzNN
Copy link

MurzNN commented Sep 29, 2017

We can already implement this feature now via bot, here is issue: turt2live/matrix-wishlist#82
This is not too hard work, so if anybody have free time or programmer resources - he can do the bot, based on Go-NEB for example.

@MurzNN
Copy link

MurzNN commented Oct 7, 2017

Seems here is admin command now in Synapse for purging rooms: https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_history_api.rst

@4nd3r
Copy link
Contributor

4nd3r commented Oct 7, 2017

@MurzNN this API doesn't delete events, but just some state related stuff, AFAIK

see https://github.com/matrix-org/synapse/blob/master/synapse/storage/events.py#L2014

@rubo77
Copy link
Contributor Author

rubo77 commented Feb 15, 2018

Any news here? An optional per-room auto-deletion feature is strongly needed!

@rubo77
Copy link
Contributor Author

rubo77 commented Nov 12, 2018

I added a script to the contrib section, that you can use: https://github.com/matrix-org/synapse/tree/develop/contrib/purge_api
This script only purges the history, so if the rooms are federated, the messages are not gone (unless purged everywhere)

@mehturt
Copy link

mehturt commented Nov 13, 2018

@rubo77 thanks.. what is the best way to discuss if the script does not work for me?

@rubo77
Copy link
Contributor Author

rubo77 commented Nov 13, 2018

If you have enhancements to the script then create a pull request here.

Or contact me in https://riot.im/app/#/room/#synapse-admins:yuhu.ddns.net as user rubo77

@cuongnv
Copy link

cuongnv commented Nov 20, 2018

@rubo77 : I create simple python script that can be remove messages after predefined timeout.
#4206

@richvdh
Copy link
Member

richvdh commented Aug 19, 2019

I think it unlikely this is a feature we will add to synapse.

@richvdh richvdh closed this as completed Aug 19, 2019
@MurzNN
Copy link

MurzNN commented Aug 19, 2019

But we have MSC2228: Self destructing events in proposed-final-comment-period - does it not related to current feature?

@richvdh
Copy link
Member

richvdh commented Aug 19, 2019

that's about events which get redacted after a certain period (eg '1 hour') which is different to an API which redacts all events older than a certain point in time (eg '06:00 today')

@rubo77
Copy link
Contributor Author

rubo77 commented Aug 19, 2019

Please reopen.

I plan to create a contribution, that adds this as an external script

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants