Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing go-ipld-prime-proto #6

Closed
hannahhoward opened this issue Jan 21, 2021 · 4 comments
Closed

Replacing go-ipld-prime-proto #6

hannahhoward opened this issue Jan 21, 2021 · 4 comments
Assignees

Comments

@hannahhoward
Copy link

hannahhoward commented Jan 21, 2021

Hey @rvagg,

With this as the "canonical" dagpb implementation, I'd like to look at replacing go-ipld-prime-proto. At the moment, there are some elements missing that prevent me from doing this:

  1. go-codec-dagpb appears to lack support for RAW nodes as leaves a UnixFS dagpb tree structure -- this is immediately essential for Filecoin which uses raw leaves whenever it generates IPLD UnixFS V1 DAGs. I'm not sure that this makes sense as part of this library -- or alternatively as a seperate codec (dag-raw is not an actual format, but bytes as leaves of the dag is part of UnixFS v1, and it's relatively trivial to implement the loading part of this at least so that the bytes are read to a Bytes node). Maybe this should just wait for flexible byte layouts and other ADLs?
  2. go-coded-dagpb uses refmt for protobuf deserialization/serialization, while go-ipld-prime-proto uses the same code-gen'd serializers used by go-ipfs (see go-merkledag). I have two concerns about switching to refmt:
    1. I need to verify performance is comparable
    2. I need to verify refmt is 100% compatible with the generated serializers/deserializers in go-ipfs

Anyway, I am going to go ahead and update go-ipld-prime-proto to go-ipld-prime v0.7.0, but I'm hoping to work with you so that's the last update. I'd be happy to go over all the features of go-ipld-prime-proto and how they are used so we can figure out how to swap out the libraries for go-graphsync.

@mikeal
Copy link

mikeal commented Jan 21, 2021

lack support for RAW nodes as leaves a UnixFS dagpb tree structure

This is also an option in IPFS so this actually needs to get fixed everywhere.

@rvagg
Copy link
Member

rvagg commented Jan 22, 2021

@warpfork @mvdan raw would seem to be an obvious thing to include in the core of ipld-prime, no? I don't think it belongs here, but it also seems too trivial to include as an entirely separate library.

go-coded-dagpb uses refmt for protobuf deserialization/serialization

It only uses refmt for a handy io.Reader wrapper that's handy for stream decoding. There's an attempt in #1 to convert it to a rewrite of the same thing that's now in ipld-prime, but that implementation appears to be buggy, or at least incompatible. For now, refmt only exists here for that one utility. The encoding and decoding here is a manual re-implementation of go-merkledag/pb with additional strictness applied that's not afforded by the protobuf codegen. I did a bunch of work on checking the various edges of the forms that go-merkledag/pb produces (see ipfs/go-merkledag#58) while clarifying the DAG-PB spec, and those same tests are included here, with some alteration given the spec decisions.

The main differences you're going to find are in what it means for something to be optional or strictly required. go-merkledag/pb doesn't make strong decisions on any of these things, just giving you nil pointers where something doesn't exist. IIRC it even allows you to have a PBLink that doesn't have a Hash property since it's just doing a straight decode and very little else, so decisions about correct forms need to be made a little up the stack, and those decisions have been different across the various implementations to date. The Logical Format in the spec makes some calls that clarifies these things into a single set of strong opinions.

I did look into the differences with go-ipld-prime-proto at one point, outlined in this comment ipld/go-ipld-prime-proto#5 (comment) (and in further discussion below) - although the "Links should be omitted" was changed in the spec so there's no "optional" now, it's always present in the data model with no opportunity for it to be absent.

Hopefully these should all be trivial details as far as graphsync, and other consumers are concerned, but it'd be great to hear if anything meaningful pops up that causes real problems! The spec work has mostly been about trying to find an idealised and faithful reconciliation of all of the various interpretations that currently exist.

@mvdan
Copy link
Contributor

mvdan commented Mar 26, 2021

As an update - we decided to tackle "decide on a DAG-PB codec and use it everywhere" in the "IPLD in IPFS" project. We went for go-codec-dagpb since it was a bit closer to ipld-prime, it didn't depend on go-merkledag, and we hoped that the spec would work.

Plus, it was a good moment to attempt to also make this jump. If we make the jump to ipld-prime with go-ipld-prime-proto, making another jump to go-codec-dagpb later would be more work overall.

@warpfork @mvdan raw would seem to be an obvious thing to include in the core of ipld-prime, no? I don't think it belongs here, but it also seems too trivial to include as an entirely separate library.

That ended up happening in ipld/go-ipld-prime#150 :)

It only uses refmt for a handy io.Reader wrapper that's handy for stream decoding. There's an attempt in #1 to convert it to a rewrite of the same thing that's now in ipld-prime, but that implementation appears to be buggy, or at least incompatible.

We've found that a go-ipfs with go-codec-dagpb is measurably slower than with go-ipld-prime-proto, so I'm looking at performance now. It's likely that will touch on the reader stuff you mention here.

@mvdan mvdan self-assigned this Mar 26, 2021
@mvdan
Copy link
Contributor

mvdan commented Mar 31, 2021

We've found that a go-ipfs with go-codec-dagpb is measurably slower than with go-ipld-prime-proto, so I'm looking at performance now.

That's all done and merged into our "IPLD in IPFS" branch; we're now on par with go-merkledag master. All other users of go-ipld-prime-proto, succh as go-car, have PRs ready switching to this codec too.

All in all, I think this is complete and we can close this issue. It might take another week or two for the main chunks of work to land in go-merkledag and go-ipfs, but that's to be expected due to reviews.

@mvdan mvdan closed this as completed Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants