-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repo checkouts #444
Repo checkouts #444
Conversation
if (!earliest && root.prev === null) { | ||
return path.reverse() | ||
} else if (earliest && root.prev.equals(earliest)) { | ||
return path.reverse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get some opinions on how getCommitPath
should be ordered.
Instinct says it should be in increasing order, ie: [commitX, commitY, commitZ] (and that's how it is right now). This makes sense because, when processing commits, you want you'd probably want to do them in order
However, I've structured it similarly to array.slice
where the first param is inclusive & the second is exclusive
This is because if you ask for a commit path form the current commitZ to commitX, you should already have knowledge of commitX & you don't actually need it, you only need commitZ & commitY.
And in sense, it would suggest that the you'd receive it in decreasing order ([commitZ, commitY])
You can feel this tension on display in pds/sync.test.ts
where I ask for earliest: commit[2], latest: commit[15]
, but then have to compare to commit.slice(3, 16)
. Feels weird 🤔
You'll also notice that in getCommitPath
on both memory-blockstore & sql-repo-storage, we have to reverse the path when returning because the path is by nature constructed from getting a reference latest and then working backwards to the earliest.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you got this right as-is, and the mismatch with slice()
makes sense to me. This is similar but a little different than slice since we're slicing in "the opposite direction" if you can say that. I think the case that highlights it best is getCommitPath(x, null)
vs. slice(x)
. In the former it's clear we mean to ask for all commits ending at (i.e. before) x
; in the latter we want items starting from (i.e. after) x
. To hammer this home, we could make earliest
optional and default it to null
: getCommitPath(x)
vs. slice(x)
.
@@ -200,31 +209,17 @@ export class MST implements DataStore { | |||
// ------------------- | |||
|
|||
// Return the necessary blocks to persist the MST to repo storage | |||
// If the topmost tree only has one entry and it's a subtree, we can eliminate the topmost tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my last MST bughunt adventures: we were trimming the top of the tree before saving, but i've moved that to the delete operation itself since it was screwing up our diff bookkeeping (& it's actually just more intuitive to do it on the operation anyway)
didResolver: DidResolver, | ||
): Promise<VerifiedCheckout> => { | ||
const repo = await ReadableRepo.load(storage, root) | ||
const validSig = await didResolver.verifySignature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say, I don't love having to thread the didResolver
through all the sync & verify methods. But it's either that or doing the resolution beforehand & threading through the approved signingKey.
It also doesn't take into account the fact that a user's signing key can change yet. (& god forbid their DID changes, we just error out on that for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not seeing any clearly better options unless there's a natural default DidResolver
, sort of similar to how in node there's a global/default http.Agent
. Can also probably be smoothed over for consumers by assigning a DidResolver
early on and threading it down here for them in some wrapper/convenience classes, etc.
import { Repo } from '@atproto/repo' | ||
import SqlRepoStorage from '../../../sql-repo-storage' | ||
import AppContext from '../../../context' | ||
import { CID } from 'multiformats/cid' | ||
|
||
export default function (server: Server, ctx: AppContext) { | ||
server.com.atproto.sync.getRoot(async ({ params }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another bikeshed, thoughts on switching this to getHead
?
I've done this in the storage interfaces to avoid confusion with the similarly named root
(which refers to the ipld node that the commit points to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i just decided to do it. lmk if you have any beef with it 🐮
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm cool with that
) => { success: true; data: T } | { success: false; error: ZodError } | ||
} | ||
|
||
export interface Def<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapped the defs in this object so RepoStorage
can give better errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dig this very much 💎
sig: Uint8Array, | ||
): Promise<boolean> { | ||
const signingKey = await this.resolveSigningKey(did) | ||
return crypto.verifySignature(signingKey, data, sig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see verifySignature
's first argument is called did
— safe to assume signingKey
takes the form of a did:key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup that's right. may be wroth making that explicit in the variable name in verifySignature
however signingKey
comes from the did spec & can either take the form of a did:key or i believe a jwk (tho at the moment, we require it to be a didKey)
storage: RepoStorage, | ||
repoCar: Uint8Array, | ||
didResolver: DidResolver, | ||
): Promise<{ root: CID; ops: RecordWriteOp[] }> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to do about this here/now I don't think, but just a thought rattling around my brain. If someone is syncing these ops
into e.g. their own index, and they want to keep their index consistent, they'd need to process all ops
at once transactionally. If the ops
were broken-up by commit, then they could gradually process commit-by-commit (or condensed into batches of commits), landing at a consistent state each step along the way. It would be useful to be able to ask something like "give me this history in ops batches of size roughly 100, always containing full commits."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. My motivation here was actually to simplify processing ops: for instance, if someone creates & deletes a record several times over a commit range, this will collapse it into one operation.
On second look, this should probably be an optional processing step done by the consumer: we return ops per commit & they can process them atomically by commit, or they can collapse them down into one big set of ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually trickier than i thought it'd be. you lose some info when going from diff -> op. specifically the CID associated with the del. It may make sense to start tracking cids in the operations as well as the records & include the cid in the delete operation as well 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah hmmm, I can see that. Could be nice! If it's too much of a rabbit hole for this PR though, I'm certainly not too hung-up on it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I ended up doing it, because it actually does make these functions much nicer & that extra info (relevant cids) is probably useful to some consumers.
Split it out into it's own PR here: #460
didResolver: DidResolver, | ||
): Promise<VerifiedCheckout> => { | ||
const repo = await ReadableRepo.load(storage, root) | ||
const validSig = await didResolver.verifySignature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not seeing any clearly better options unless there's a natural default DidResolver
, sort of similar to how in node there's a global/default http.Agent
. Can also probably be smoothed over for consumers by assigning a DidResolver
early on and threading it down here for them in some wrapper/convenience classes, etc.
This started out as introducing repo checkouts to the API but spun out into several related things that improve our storage & sync layer:
ReadableBlockstore
and the higher levelRepoStorage
ReadableRepo
that allows for loading a repo off of the naive blockstoretemp
fromRepoStorage
and introduce a newSyncStorage
class on top of two readable blockstores. this is flexible such that you can handle sync in memory or with an on-disk cachegetCommitPath
as a bonusNote: this one does not yet include blobs for diffs/checkouts