Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add events for account and record update/delete/deactivation #2661

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

foysalit
Copy link
Contributor

This PR adds 4 new events to enable ozone to keep track of account deletion/deactivation and record update/delete. The record update event will primarily be used for profiles and lists for the time being.

Questions:

  1. Should we add logic in the event logger to ensure that duplicate events for account or record delete are discarded?

@bnewbold
Copy link
Collaborator

bnewbold commented Aug 1, 2024

thanks for writing these up! having something concrete is always great for getting brain flowing.

I'd be pretty inclined to roll up the account events in to 1-2 event types. Eg, either a single "account status change", or two events, one aligning with the #account firehose event (deletion, deactivation) and one aligning with #identity (handle change, PDS migration). would add fields like accountStatus (active, deleted, etc), handle, and pdsHost to track those values over time.

So either a single event:

#accountStateEvent
  comment: string
  accountStatus: string, one of 'deleted', 'deactivated', 'suspended', 'takedown', 'active'
  handle: string
  pdsHost: string (URL)
  timestamp: datetime (estimate of when the state change actually happened, which might be different from ozone timestamp if there was a delay or backfill) (alt names welcome)

or two:

#identityEvent
  comment: string
  handle: string
  pdsHost: string (URL)
  timestamp: datetime

#accountEvent
  accountStatus: string, one of 'deleted', 'deactivated', 'suspended', 'takedown', 'active'
  timestamp: datetime

For the record events, it feels like we could also merge them in to a single #recordEvent? But i'm not as sure.

#recordEvent
  comment: string
  recordCid: CID of the record itself (for updates; nil means deleted?)
  deleted: bool (if we don't want to rely on recordCid being nil to mean deleted?)
  created: bool (indicates the record is created, not updated)
  record: do we embed the actual record, for updates? I guess that will be stored/cached elsewhere
  timestamp: datetime

In addition to profile records, we would probably have automod track:

  • when records are deleted or updated, check if they have active labels, and record the event in that case (aka, if there has been a mod intervention on a record, want to record future state for that record)
  • all updates to labeler declaration records
  • mod list declarations (as mentioned)

@foysalit
Copy link
Contributor Author

foysalit commented Aug 8, 2024

made some adjustments according to your feedback. couple of questions though

  1. not sure why we'd need a created event for record? I wouldn't expect to be proactively tracking record creations from within ozone.
  2. suspended takedown states for accounts would only be helpful for 3p labelers I guess? just wondering if this would feel weird for instances with takedown capabilites?

@bnewbold
Copy link
Collaborator

bnewbold commented Aug 9, 2024

for 1: seems good to track if a record event is an update or creation just for completeness? I could imagine us deciding we want to track creation events for some specific record types, like labeler declarations, which are relatively high-stakes and low-volume in the network.

for 2: we'd want to track the status of accounts on independent PDS instances. it does get a bit "loopy" to double-track this in our ozone instance for accounts on our PDS instances, maybe we would skip generating those events? remember that we'll also be splitting out infra takedowns as a distinct event type from current takedowns, as some separate tech debt to clean up.

Copy link
Collaborator

@bnewbold bnewbold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall very much like this shape/direction!

}
}
},
"identityEvent": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness, there is a concept of "tombstoning" an identity. this has been very rare in practice. in theory it is a permanent deletion, though technically it can be reverted in PLC within a time period (72hr), and can be reverted any time with did:web.

it is sort of a corner-case, but would actually be a good thing to track in ozone because otherwise it would be very hard for mods to understand what had happened: a deleted identity makes it pretty hard to learn anything about an account from the live network.

@foysalit
Copy link
Contributor Author

foysalit commented Aug 9, 2024

  • instead of making accountStatus enum we could just make it a string with knownValues so that suspended and takedown etc. can be facilitated but not documented? but yeah, happy to add it to the list too.

  • for recordEvent, wondering if we should take a similar approach? instead of having separate flags for created updated etc., lifecycle string field with knownValues: ['created', 'updated', 'deleted'] could fit better, right? in fact, maybe this lifecycle/status property is suitable for all 3 events now that you mention that identity events can also have a tombstone?

@bnewbold
Copy link
Collaborator

bnewbold commented Aug 9, 2024

Yeah, I think in other contexts we also have it as knownValues instead of an enum or closed union. This allows future flexibility ("stale"? "maintenance"?) while keeping the active/not-active aspect clear.

I think for identity and record states, we know more clearly that there are just the 3x operations on a record (create, update, delete), and that an identity is active or tombstoned (this is inherited from the W3C DID semantics). I would do just a boolean for identity tombstoned. For records, it could in theory be just a boolean and the cid string: no cid would mean deleted, and created: true would mean creation and created: false (or missing) would mean updated. But a string with knownValues wouldn't be crazy.

@foysalit
Copy link
Contributor Author

But a string with knownValues wouldn't be crazy.

Sounds like you're not super against it and I like the idea of just storing an op: 'created' | 'updated' | 'deleted' flag instead of the boolean flag approach so I've redesigned to use the op property.

Also added an optional tombstone: boolean flag in identity event. @bnewbold

@bnewbold
Copy link
Collaborator

for accountStatus, I think we still need takendown and suspended. the use-case for this is when an account is takendown on a self-hosted PDS which we don't moderate: this is the sort of situation which would be very confusing for mod team to understand/debug without this sort of event tracking.

here are the knownValues from com.atproto.sync.subscribeRepos, which I think we should match one-to-one:
https://github.com/bluesky-social/atproto/blob/main/lexicons/com/atproto/sync/subscribeRepos.json#L145

I would also put the active boolean flag in; if it is easier to have an active string value as well that is fine (eg, I could imagine wanting to query with acountStatus=active, or do aggregations on a single value), but I would basically just try to keep the field as one-to-one with the repo stream schema as possible.

record event looks good!

if we can get a quick set of eyes from @devinivy on this it would be great.

Copy link
Collaborator

@devinivy devinivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small Qs and lexicon suggestions. Looks good!

"type": "string",
"knownValues": ["create", "update", "delete"]
},
"cid": { "type": "string" },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"cid": { "type": "string" },
"cid": { "type": "string", "format": "cid" },

"required": ["timestamp"],
"properties": {
"comment": { "type": "string" },
"handle": { "type": "string" },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"handle": { "type": "string" },
"handle": { "type": "string", "format": "handle" },

@@ -317,6 +320,25 @@ export class ModerationService {
}
}

if (isAccountEvent(event)) {
meta.active = !!event.active
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If active is missing do we want to interpret it as though the account went inactive? I wonder if we may want to omit it from meta altogether in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the !! is confusing here but this is a rerouted from #account from the firehose and the active flag should be always set https://github.com/bluesky-social/atproto/blob/main/lexicons/com/atproto/sync/subscribeRepos.json#L133

@@ -184,6 +184,25 @@ export class ModerationViews {
eventView.event.remove = event.removedTags || []
}

if (event.action === 'tools.ozone.moderation.defs#accountEvent') {
eventView.event.active = !!event.meta?.active
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here!

@foysalit foysalit requested a review from devinivy August 15, 2024 13:59
@@ -395,7 +395,7 @@
"accountEvent": {
"type": "object",
"description": "Logs account status related events on a repo subject. Normally captured by automod from the firehose and emitted to ozone for historical tracking.",
"required": ["timestamp", "status"],
"required": ["timestamp", "status", "active"],
Copy link
Collaborator

@devinivy devinivy Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a required field means that old events missing this field will become invalid. Is that fine in this case (i.e. all pass events in practice already include the field)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh haha nevermind! I was reading diff from recent commits and forgot this event was brand new in the PR 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants