Repo checkouts #444

dholms · 2022-12-27T23:49:40Z

This started out as introducing repo checkouts to the API but spun out into several related things that improve our storage & sync layer:

ripped out auth lib (we weren't actually making use of this & it was just adding extra song and dance to all of repo operations & sync verification)
hash blocks as they come in from car file to ensure the bytes match the purported CID
better error handling
split out repo storage class into the naive ReadableBlockstore and the higher level RepoStorage
alongside that, introduce ReadableRepo that allows for loading a repo off of the naive blockstore
enabled by both of those, we can remove temp from RepoStorage and introduce a new SyncStorage class on top of two readable blockstores. this is flexible such that you can handle sync in memory or with an on-disk cache
improve the sync & verification methods to make use of all of this
AND FINALLY add in repo checkouts
also added getCommitPath as a bonus

Note: this one does not yet include blobs for diffs/checkouts

packages/repo/src/util.ts

dholms · 2023-01-03T23:48:44Z

packages/repo/src/storage/memory-blockstore.ts

      if (!earliest && root.prev === null) {
        return path.reverse()
+      } else if (earliest && root.prev.equals(earliest)) {
+        return path.reverse()


I'd like to get some opinions on how getCommitPath should be ordered.

Instinct says it should be in increasing order, ie: [commitX, commitY, commitZ] (and that's how it is right now). This makes sense because, when processing commits, you want you'd probably want to do them in order

However, I've structured it similarly to array.slice where the first param is inclusive & the second is exclusive
This is because if you ask for a commit path form the current commitZ to commitX, you should already have knowledge of commitX & you don't actually need it, you only need commitZ & commitY.
And in sense, it would suggest that the you'd receive it in decreasing order ([commitZ, commitY])

You can feel this tension on display in pds/sync.test.ts where I ask for earliest: commit[2], latest: commit[15], but then have to compare to commit.slice(3, 16). Feels weird 🤔

You'll also notice that in getCommitPath on both memory-blockstore & sql-repo-storage, we have to reverse the path when returning because the path is by nature constructed from getting a reference latest and then working backwards to the earliest.

Any thoughts?

I think you got this right as-is, and the mismatch with slice() makes sense to me. This is similar but a little different than slice since we're slicing in "the opposite direction" if you can say that. I think the case that highlights it best is getCommitPath(x, null) vs. slice(x). In the former it's clear we mean to ask for all commits ending at (i.e. before) x; in the latter we want items starting from (i.e. after) x. To hammer this home, we could make earliest optional and default it to null: getCommitPath(x) vs. slice(x).

dholms · 2023-01-04T00:29:36Z

packages/repo/src/mst/mst.ts

@@ -200,31 +209,17 @@ export class MST implements DataStore {
  // -------------------

  // Return the necessary blocks to persist the MST to repo storage
-  // If the topmost tree only has one entry and it's a subtree, we can eliminate the topmost tree


From my last MST bughunt adventures: we were trimming the top of the tree before saving, but i've moved that to the delete operation itself since it was screwing up our diff bookkeeping (& it's actually just more intuitive to do it on the operation anyway)

dholms · 2023-01-04T00:40:57Z

packages/repo/src/verify.ts

+  didResolver: DidResolver,
+): Promise<VerifiedCheckout> => {
+  const repo = await ReadableRepo.load(storage, root)
+  const validSig = await didResolver.verifySignature(


I will say, I don't love having to thread the didResolver through all the sync & verify methods. But it's either that or doing the resolution beforehand & threading through the approved signingKey.
It also doesn't take into account the fact that a user's signing key can change yet. (& god forbid their DID changes, we just error out on that for now)

Yeah, I'm not seeing any clearly better options unless there's a natural default DidResolver, sort of similar to how in node there's a global/default http.Agent. Can also probably be smoothed over for consumers by assigning a DidResolver early on and threading it down here for them in some wrapper/convenience classes, etc.

dholms · 2023-01-04T00:52:27Z

packages/pds/src/api/com/atproto/sync.ts

 import { Repo } from '@atproto/repo'
 import SqlRepoStorage from '../../../sql-repo-storage'
 import AppContext from '../../../context'
+import { CID } from 'multiformats/cid'

 export default function (server: Server, ctx: AppContext) {
  server.com.atproto.sync.getRoot(async ({ params }) => {


another bikeshed, thoughts on switching this to getHead?

I've done this in the storage interfaces to avoid confusion with the similarly named root (which refers to the ipld node that the commit points to).

actually i just decided to do it. lmk if you have any beef with it 🐮

Yeah I'm cool with that

dholms · 2023-01-04T02:24:35Z

packages/common/src/check.ts

+  ) => { success: true; data: T } | { success: false; error: ZodError }
+}
+
+export interface Def<T> {


wrapped the defs in this object so RepoStorage can give better errors

devinivy

I dig this very much 💎

devinivy · 2023-01-04T04:40:35Z

packages/did-resolver/src/resolver.ts

+    sig: Uint8Array,
+  ): Promise<boolean> {
+    const signingKey = await this.resolveSigningKey(did)
+    return crypto.verifySignature(signingKey, data, sig)


I see verifySignature's first argument is called did— safe to assume signingKey takes the form of a did:key?

yup that's right. may be wroth making that explicit in the variable name in verifySignature however signingKey comes from the did spec & can either take the form of a did:key or i believe a jwk (tho at the moment, we require it to be a didKey)

packages/pds/src/api/com/atproto/sync.ts

packages/repo/src/readable-repo.ts

packages/repo/src/storage/memory-blockstore.ts

packages/repo/src/sync.ts

devinivy · 2023-01-04T16:39:56Z

packages/repo/src/sync.ts

+  storage: RepoStorage,
+  repoCar: Uint8Array,
+  didResolver: DidResolver,
+): Promise<{ root: CID; ops: RecordWriteOp[] }> => {


Nothing to do about this here/now I don't think, but just a thought rattling around my brain. If someone is syncing these ops into e.g. their own index, and they want to keep their index consistent, they'd need to process all ops at once transactionally. If the ops were broken-up by commit, then they could gradually process commit-by-commit (or condensed into batches of commits), landing at a consistent state each step along the way. It would be useful to be able to ask something like "give me this history in ops batches of size roughly 100, always containing full commits."

That's a good point. My motivation here was actually to simplify processing ops: for instance, if someone creates & deletes a record several times over a commit range, this will collapse it into one operation.

On second look, this should probably be an optional processing step done by the consumer: we return ops per commit & they can process them atomically by commit, or they can collapse them down into one big set of ops

this is actually trickier than i thought it'd be. you lose some info when going from diff -> op. specifically the CID associated with the del. It may make sense to start tracking cids in the operations as well as the records & include the cid in the delete operation as well 🤔

Yeah hmmm, I can see that. Could be nice! If it's too much of a rabbit hole for this PR though, I'm certainly not too hung-up on it right now.

Alright I ended up doing it, because it actually does make these functions much nicer & that extra info (relevant cids) is probably useful to some consumers.

Split it out into it's own PR here: #460

packages/repo/src/util.ts

devinivy · 2023-01-04T16:51:09Z

packages/repo/src/verify.ts

+  didResolver: DidResolver,
+): Promise<VerifiedCheckout> => {
+  const repo = await ReadableRepo.load(storage, root)
+  const validSig = await didResolver.verifySignature(


Yeah, I'm not seeing any clearly better options unless there's a natural default DidResolver, sort of similar to how in node there's a global/default http.Agent. Can also probably be smoothed over for consumers by assigning a DidResolver early on and threading it down here for them in some wrapper/convenience classes, etc.

dholms added 3 commits December 26, 2022 16:35

Merge branch 'repo-mutation-diffs' into repo-storage-migration

380ef96

wip

32a21ab

ripping out auth lib

d643250

dholms changed the base branch from main to repo-storage-migration December 27, 2022 23:49

dholms added 4 commits December 27, 2022 18:17

more auth cleanup

69be03f

another lurker

b55f9d4

wip better sync primitives

4523cad

wip

6b11ba4

Base automatically changed from repo-storage-migration to sync-revamp December 29, 2022 21:16

dholms added 7 commits December 29, 2022 17:11

improving diffs & sync

8e7131d

tests working!

d6cd794

actually implemented checkout lol

fe109c6

Merge branch 'sync-revamp' into repo-checkout

a1ed8c3

simplify interface & improve error handling

fdd60f7

writing sql storage code

87abad4

fixing up tests

473bc92

dholms mentioned this pull request Dec 30, 2022

Feature branch: storage & sync revamp #446

Merged

dholms added 4 commits January 2, 2023 18:02

testing & bugfixes

0781d90

checkouts return records instead of cids

9f25a26

one last refactor lol

8c1e9d5

missed one

11970ef

dholms changed the title ~~Repo checkout - no history~~ Repo checkouts Jan 3, 2023

dholms commented Jan 3, 2023

View reviewed changes

packages/repo/src/util.ts Outdated Show resolved Hide resolved

dholms added 3 commits January 3, 2023 16:05

handle other cid codecs on incoming car verification

cff5b63

tests + tricky bugs

b9b83d0

unneeded blockstore method

c37082a

dholms commented Jan 3, 2023

View reviewed changes

dholms added 2 commits January 3, 2023 18:26

trim mst on del instead of save

6e58a0f

cleanup comment

3269ad6

dholms commented Jan 4, 2023

View reviewed changes

dholms marked this pull request as ready for review January 4, 2023 00:29

dholms commented Jan 4, 2023

View reviewed changes

dont resolve did for every commit

b8cfb78

dholms commented Jan 4, 2023

View reviewed changes

dholms added 2 commits January 3, 2023 18:52

use "commit" instead of "root"

156b332

getRoot -> getHead

4953189

dholms commented Jan 4, 2023

View reviewed changes

devinivy approved these changes Jan 4, 2023

View reviewed changes

dholms added 2 commits January 4, 2023 14:53

pr feedback

716fb8b

very silly bug fix

f647640

dholms mentioned this pull request Jan 4, 2023

Record write descriptions #460

Merged

dholms merged commit 423b6ab into sync-revamp Jan 5, 2023

dholms deleted the repo-checkout branch January 5, 2023 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo checkouts #444

Repo checkouts #444

dholms commented Dec 27, 2022 •

edited

Loading

dholms Jan 3, 2023 •

edited

Loading

devinivy Jan 4, 2023

dholms Jan 4, 2023

dholms Jan 4, 2023

devinivy Jan 4, 2023

dholms Jan 4, 2023

dholms Jan 4, 2023

pfrazee Jan 4, 2023

dholms Jan 4, 2023

devinivy left a comment

devinivy Jan 4, 2023

dholms Jan 4, 2023

devinivy Jan 4, 2023

dholms Jan 4, 2023

dholms Jan 4, 2023

devinivy Jan 4, 2023

dholms Jan 4, 2023 •

edited

Loading

devinivy Jan 4, 2023

Repo checkouts #444

Repo checkouts #444

Conversation

dholms commented Dec 27, 2022 • edited Loading

dholms Jan 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devinivy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dholms Jan 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dholms commented Dec 27, 2022 •

edited

Loading

dholms Jan 3, 2023 •

edited

Loading

dholms Jan 4, 2023 •

edited

Loading