Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Unixfs implementations #224

Open
Jorropo opened this issue Mar 25, 2023 · 2 comments
Open

New Unixfs implementations #224

Jorropo opened this issue Mar 25, 2023 · 2 comments
Labels
P1 High: Likely tackled by core team if no one steps up

Comments

@Jorropo
Copy link
Contributor

Jorropo commented Mar 25, 2023

Dedicated unixfs implementations are often much more performant and efficient.
See for example two of mines:

I think it is more sustainable to maintain a handful of dedicated implementations that individually do one job well (something like feather and something that would maintain state to support unordered dags are really different).

Unified middle layer

The way I have been writing my implementations is by jugling bytes unixfs.proto and dag-pb.proto by hand everywhere.
This is kinda meh looking because of all the little details that have to be checked.

I really think we need some efficient middle layer that takes in blocks.Block parse them, do sanity checks and return some representation that looks like internal unixfs (so one node per block, and full support for all schenanigans you would want to do but with error checking):

package unixfs

type Type uint8

const (
  _ Type = iota
  Directory
  File
  Metadata
  Symlink
  HAMTShard
)

type Node struct{
  Type Type
  Data []byte // for files
  Entries []Entry
  DagSize uint64
  HashType uint64 // for HAMT dirs
  Fanout uint64 // for HAMT dirs
}

type Entry struct{
  Link
  Name string // for directories
}

type Link struct{
  Type Type // Indicative, not actually
  Cid cid.Cid
  FileSize uint64 // for files
  DagSize uint64
}

func (*Node) UnmarshalIPFS(blocks.Block) error
func (*Node) MarshalIPFS() (blocks.Block, error)

func (*Node) Encode() (Link, blocks.Block, error)

// File operations
func (*Node) ParseFile(Link, blocks.Block) error
func (*Node) AddFileSegments(Link...) error
func (*Node) FileSize() uint64

// Directory operations
func (*Node) ParseDirectory(Link, blocks.Block) error
func (*Node) AddChildrenNode(Link...) error

// other types ...

This would not includes helper functions for more advanced stuff like HAMT, chunking, ... we will most likely need thoses too but the goal here is to provide a thin wrapper around the protobuf that add the repetitive input validation.

Impls

List of implementations we need:

  • Efficient state-light streaming decoder with incremental verification. (I'll probably use feather after refactoring it to use the unified lower layer and adding the missing features)
    The goal is to incremental verify ordered .car files from saturn on the cheap resource wise. (and other ordered blocks sources)
    • Capable of streaming the verification of an ordered stream of block (such as a car file red from io.Reader) incrementally (while streaming the result out).
    • Interface must implement io.Reader for reading from files without any background goroutine (efficiency).
    • It must support incrementally verifying requests mapping to feat: refactor gateway api to operate on higher level semantics #176 sementics.
  • Decoder with incremental verification and output stream with random walk BFS input.
    • Streaming data in order can have a high cost, however if we are cleaver we can use .WriteAt to write blocks in any order as long as we receive roots before leaves (or cache leaves but then incremental verification is not possible)
  • MFS rewrite.
  • Some write opperation helpers (concatenate, chunking, ...)
  • ... ?
@Jorropo Jorropo added the need/triage Needs initial labeling and prioritization label Mar 25, 2023
@Jorropo Jorropo added P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization labels Oct 5, 2023
@Jorropo Jorropo self-assigned this Oct 5, 2023
@BigLep
Copy link
Contributor

BigLep commented Oct 5, 2023

2023-10-05 conversation:

@BigLep BigLep mentioned this issue Nov 9, 2023
11 tasks
@m0ar
Copy link

m0ar commented Dec 13, 2023

@Jorropo something we'd very much like is a fast recursive ls to help figure out the structure of an unixFS DAG, basically tree. Right now, we do O(dirs) ipfs-unixfs.ls requests to a kubo node to be able to figure out a file structure, without actually fetching the files themselves. This is of course ridiculously inefficient and take ~60 seconds even with our own node within AWS. We can't get the entire DAG, including the leaves, because it's generally way too large.

So, my questions:

  1. Is this recursive ls something what would be in scope for boxo, or is it the wrong abstraction layer?
  2. Could I use boxo to implement a stand-alone service to do it?
  3. Is it better to implement it inside kubo, when this boxo improvement gets delivered?

@Jorropo Jorropo removed their assignment Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High: Likely tackled by core team if no one steps up
Projects
No open projects
Status: 🏃‍♀️ In Progress
Development

No branches or pull requests

3 participants