Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework sequencer storage & add handle update #711

Merged
merged 17 commits into from
Mar 28, 2023

Conversation

dholms
Copy link
Collaborator

@dholms dholms commented Mar 23, 2023

This reworks our sequencer storage so that we do the work upfront instead of collating it when we serve.
This does cause some duplicated data however, I do not think it's a concern for several reasons:

  • we are not storing large events here
  • given our 24hr backfill, we can garbage collect old seqs whenever

This allows our repo_seq table to hold heterogenous event types. It will also speed up the subscription route.

This PR also adds in support for handle update events.


  • Dropped repo_op table as we weren't using it anymore
  • For tooBig events I chose a limit of 200 ops or 1MB of block data.
  • Put a limit of 200 ops on batchWrite
  • I cbor encoded messages becase:
    • it let me encode CIDs
    • but mostly, sqlite doesn't support json & this seemed better than json stringifying
    • it does unfortunately make our PG database more opaque
  • I added an invalidatedBy column to seq which tracks if an event is invalidated by a later event namely:
    • commits are invalidated by a later rebase
    • handle updates are invalidated by a later handle update

@dholms dholms marked this pull request as ready for review March 24, 2023 00:22
@dholms dholms changed the title Rework sequencer storage Rework sequencer storage & add handle update Mar 24, 2023
@dholms dholms mentioned this pull request Mar 24, 2023
Comment on lines 32 to 34
if (tx.writes.length > 200) {
throw new InvalidRequestError('Two many writes. Max: 200')
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this check lives here because you don't want it to be a hard limit in the lexicon (i.e. for all PDS implementations), is that right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea exactly. There are cases where you may want to do more than 200 writes & i feel that we shouldn't put this at the lexicon level. There's a good chance we'll remove this check at some point when we have a better handle on the repercussions of large mutations

.addColumn('sequencedAt', 'varchar', (col) => col.notNull())
.execute()

await db.schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having trouble recalling which types of queries these indices support. I kinda like the comments we've added in some of the other migrations documenting this, have found it useful!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup i'll add some comments 👍

if (dialect === 'pg') {
builder = builder
.addColumn('seq', 'bigserial', (col) => col.primaryKey())
.addColumn('invalidatedBy', 'bigint')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig the invalidatedBy idea! I wonder if we should also make it a foreign key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah I'd be game 👍

Comment on lines +28 to +30
const justRoot = new BlockMap()
justRoot.add(commitData.blocks.get(commitData.commit))
carSlice = await blocksToCar(commitData.commit, justRoot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

Comment on lines -64 to +68
.innerJoin('repo_commit_history', (join) =>
join
.onRef('repo_commit_history.creator', '=', 'repo_seq.did')
.onRef('repo_commit_history.commit', '=', 'repo_seq.commit'),
)
.select([
'repo_seq.seq as seq',
'repo_seq.did as did',
'repo_seq.commit as commit',
'repo_seq.sequencedAt as sequencedAt',
'repo_commit_history.prev as prev',
])
.selectAll()
.orderBy('seq', 'asc')
.where('invalidatedBy', 'is', null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤤 so nice, this thing is gonna whiz.

Copy link
Collaborator

@devinivy devinivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 so nice

@dholms dholms merged commit 6bec87c into feature/subscription-revamp Mar 28, 2023
@dholms dholms deleted the rework-seqs branch March 28, 2023 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants