Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Add weekly sync meeting notes 2021-01-04 #114

Merged
merged 1 commit into from
Jan 12, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions meeting-notes/2021/2021-01-04--ipld-sync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# 🖧 IPLD Weekly Sync 🙌🏽 2021-01-04

- **Lead:** @rvagg
- **Notetaker:** @rvagg
- **Attendees:**
- @rvagg
- @ribasushi
- @mvdan
- @warpfork
- @mikeal
- @chafey
- **Recording:** https://youtu.be/tUGYIRz4R3k

## Agenda

- General
- Start recording
- Start live stream
- Find a notetaker
- Ask everyone to put their name into the list of attendees
- Ask everyone to put their items that they've been working on the past two weeks (should be done prior to the meeting)
- Ask for last minute agenda items
- This meeting
- should ipld-prime's node helpers really use int64 for lengths? `MakeMap(int64(len(things)), etc...`
- IEEE 754 leakage into the data model (Inf, -Inf, NaN)?
- adin: large blocks (MB, GB, etc.), how does IPLD feel about them?
- We could at the data exchange layer move large blocks in increments using schemes like https://hackmd.io/@adin/sha256-dag. This approach is per-hash function though and theoretically there is more than one approach per hash function. Is this reasonable to define at the network layer (e.g Bitswap)?
- Similarly, could allow UnixFS files to be requested as either DAGs or single large blocks to match existing hashes of files. Good idea/bad idea?
- _add your agenda item_


## Weekly Update

@mvdan
- first week back
- wrapped up ipld-prime 0.7 refactors before heading off for the holidays
- done a few reviews; Eric proposed a new fluent Go API to create nodes called "quip"
- https://github.com/ipld/go-ipld-prime/pull/134
- I proposed an alteration with a bit more magic, same perf/allocs, and less verbosity.
- https://github.com/ipld/go-ipld-prime/tree/quip-mvdan
- I poked a bit at why go-ipfs is so darn heavy
- turns out it indirectly depends on gojay, a pretty heavy json reimplementation that's faster than std
- gojay takes nearly a megabyte in the final binary :(
- pretty much required only by quic-go's logging, so I've poked Marten about it
- he'll share benchmark code with me soon, so I definitely want to check that
- weekend analysis of the top Go modules on github, for, err, research: https://protocollabs.slack.com/archives/CKXLJVAL9/p1609703145263500

@rvagg
- https://github.com/rvagg/js-ipld-garbage
- JS CBOR
- go-codec-dagpb - v1.0.0, test coverage, fixed int64 problem for x86

@warpfork

- spent a lot of the holiday playing with building new applications, and really wanted to use our tools for a lot of it... spurred some new developments
- `traverse.FocusedTransform` feature for doing point mutations to existing data -- https://github.com/ipld/go-ipld-prime/pull/130/
- you provide a document and a path to the part you want to update: it tells you what's there and asks you what you want to replace that subtree with.
- works even when it crosses links!
- had been planned for ages, now actually implemented.
- tried to make something like `FocusedTransform` but for multiple operations at once... turned out tricky; putting it back in the design-work-needed bin.
- ended up with something very similar to the "JSON Patch" RFC...
- but this is trickier than it seems:
- optimal implementation will decide in what order to apply operations...
- but not all operations are commutative.
- map insertions generally are
- list insertions or deletions *are not* (but appends are (sort of))
- *hard to tell which of those you're dealing with*
- and bad or conditional support for list operations makes the whole thing feel rickety.
- also, anyone wanna implement a zero-alloc radix tree for golang? it'll be fun, i promise
- more comments here: https://github.com/ipld/go-ipld-prime/pull/130/#issuecomment-751463666
- new draft package for programmatic data creation: "quip": https://github.com/ipld/go-ipld-prime/pull/134/
- problem statement review: the go-ipld-prime Node and NodeAssembler APIs are very powerful and model the domain very accurately, but the flip side of the same coin is that they are verbose to use (like most AST-like libraries tend to be)... so, they really want usability facades for common operations.
- the approach taken here is to let the user hand in an error slot (a *pointer* to an error value -- which is nil to start with), and as soon as that's non-zero... quit doing stuff.
- makes fairly linear code flow, and also has no measurable performance tax, which makes it more viable than some of our other attempts at usability facades.
- there may be more than one way to go about handling the original problem statement here -- might make several available and see what sticks.
- **we tagged a go-ipld-prime v0.7.0 release!** https://github.com/ipld/go-ipld-prime/releases/tag/v0.7.0
- has a few breaking changes (but they're mild, and we've got migration instructions at the link above).
- ints now have explicit sizes. more spec-compliant if you're building on, say, 32-bit architectures.
- kinds are just called kinds now (the "reprkind" term is banished). I think literally everyone wanted this name cleanup.
- we *did not* end up doing some other more invasive changes we considered earlier, like trying to remove errors from Node APIs. (That change in particular is probably permanently cancelled; the attempted simplification would've sacrificed correctness, so it didn't actually work out.)
- should be a pretty quick upgrade from v0.6.0, which we also tagged very recently.
- go-ipld-prime codegen now supports unions with stringprefix representation -- https://github.com/ipld/go-ipld-prime/pull/133/
- lastly... tried to do some more work to make the go-ipld-prime schema types do some self-hosting stuff based on generated code based on the schema-schema which can handle schema DMT documents.
- aaaand an unfortunately serious rethink is needed here. import cycles make some approaches impossible.
- impossible with asterisks. considered several possibilities with @mvdan. some hacky angles are possible but... may be too bizarre to be a good idea.
- tried to do some stuff with the docs site
- no comment

@ribasushi
- Got first relaible-ish version of the DB-backed chainstore working, including replication test
- https://docs.google.com/document/d/1VdAbtmDsPVPNWrP00SkYbBZPLI1HEbTF0pXKr-KIOB8/edit?disco=AAAALCvsAMQ

@mikeal
- Mikola blogged about the trees we've been working on https://0fps.net/2020/12/19/peer-to-peer-ordered-search-indexes/
- [chunky-trees](https://github.com/mikeal/chunky-trees)
- Serialization is not the centerpiece of the design.
- [IPSQL](https://github.com/mikeal/IPSQL)
- works!
- discuss: streaming vs concurrency

@chafey
- been out on vacation for past few weeks
- looking to transition js-graphsync to IPFS team this week


## Notes

### Should ipld-prime's node helpers really use int64 for lengths? `MakeMap(int64(len(things)), etc...`

- @mvdan: simpler code for non-constants, but overflows are silent, and we don't want to discourage using fluent APIs
- @warpfork: not a good idea overall, going with int64 uniformly seems like a good idea

### IEEE 754 leakage into the data model (Inf, -Inf, NaN)?

https://github.com/ipld/specs/issues/342

- discussion about the issue, @mikeal & @warpfork both leaning strongly to simply rejecting them
- @warpfork pointed out that the 754 spec allows NaN to be a very wide range of bits (although CBOR specifies a single encoding)

### Streaming sync challenges

@aschmahmann shared thoughts around block sync

- graphsync + bitswap - sync - perform a query and share the list of blocks and let the client decide what to fetch
- possibilities for reading in reverse and validating the hash where it's a Merkle–Damgård construction which you could reverse-chunk validate
- discussion around possibilities of extending the notion of IPFS identifiers to allow inclusion of large files that exist in the real-world
- if this becomes a network / protocol thing rather than an IPLD thing then it's simpler and we don't need to worry about codecs