Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] unixfs: decouple the DAG traversal logic from the DAG reader #5257

Closed
wants to merge 3 commits into from
Closed

[WIP] unixfs: decouple the DAG traversal logic from the DAG reader #5257

wants to merge 3 commits into from

Conversation

schomatis
Copy link
Contributor

@schomatis schomatis commented Jul 18, 2018

Update: This PR (now closed) has been split in:


Do not merge.

TODO:

Closes #5192.


unixfs: decouple the DAG traversal logic from the DAG reader

Decouple the DAG traversal logic from the `PBDagReader` encapsulating it in the
(new) `Traversal` structure.

Methods like `Read` and `Seek` now have the remaining non-traversal logic inside
the `VisitHandler` (from the `Traversal` structure) where the file DAG logic
dictates how to interpret each node visited in the `Traversal`'s `Iterate` and
`Search` algorithms.

@schomatis schomatis added status/in-progress In progress topic/UnixFS Topic UnixFS labels Jul 18, 2018
@schomatis schomatis added this to the Files API Documentation milestone Jul 18, 2018
@schomatis schomatis self-assigned this Jul 18, 2018
@schomatis schomatis requested a review from Kubuxu as a code owner July 18, 2018 15:21
@schomatis schomatis removed the request for review from Kubuxu July 19, 2018 03:25
@whyrusleeping
Copy link
Member

just wanted to say that this is really cool

Decouple the DAG traversal logic from the `PBDagReader` encapsulating it in the
(new) `Traversal` structure.

Methods like `Read` and `Seek` now have the remaining non-traversal logic inside
the `VisitHandler` (from the `Traversal` structure) where the file DAG logic
dictates how to interpret each node visited in the `Traversal`'s `Iterate` and
`Search` algorithms.

TODO: Expand on the following,

 * Remove `ErrUnexpectedEOF`.

 * Move `precalcNextBuf` to the `Traversal` structure encapsulating it in the
   `fetchChild` method.

 * Deprecate the `ReadSeekCloser` interface that implicitly chained the
   node/readers into a DAG path in favor of the more explicit `path` slice in
   `Traversal` .

 * The iterative nature of `Traversal` and how it is not crucial to this change
   (the `Iterate` and `Search` could have been implemented with recursion).

 * Collapse PB and Buffer DAG readers into one, they now can both be handled
   through `Traversal`.

License: MIT
Signed-off-by: Lucas Molas <schomatis@gmail.com>
@schomatis
Copy link
Contributor Author

@Mr0grog Could you take a look at this please? It's not finished, but I would like to get your opinion on how understandable is this design, any kind of feedback will be very useful. You don't need to proof read or make any corrections to the documentation at this point (I will bother you with that later), it's just to get your take on how this can fit the mental model described in the documentation of the balanced builder.

From your previous knowledge of the balanced package, take a look at this commit, especially the new Traversal structure and how it's leveraged in the Read and Seek methods of the DAG reader (comparing it with the previous implementation), ignore any TODOs in the code.

@schomatis schomatis changed the title [WIP] DFS iteration variation for the DAG reader [WIP] unixfs: decouple the DAG traversal logic from the DAG reader Jul 30, 2018
@Mr0grog
Copy link
Contributor

Mr0grog commented Jul 30, 2018

@schomatis I don’t know if I’ll be able to get to this before Wednesday or Thursday. Is that ok?

//
// TODO: Revisit the name. `Traverser`? (is "traverser" the correct noun?),
// `Iterator` (this would put to much emphasis on iterate whereas other
// traversal operations like search are supported), `Topology`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walk and WalkFunc are the terms chosen in some comparable standard library types, fwiw: https://golang.org/pkg/path/filepath/#Walk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks @warpfork, I'm not a native English speaker and there's something in the term "traversal" that just doesn't sound right to me.

Copy link
Contributor

@Mr0grog Mr0grog Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, “traverse/traversal/traverser” is a technically correct English term for this that I have seen in many libraries (it even has a Wikipedia page). That said, I think I’ve seen “walk/walking/walker” used just as often and it’s shorter :)

@schomatis
Copy link
Contributor Author

@schomatis I don’t know if I’ll be able to get to this before Wednesday or Thursday. Is that ok?

@Mr0grog No rush at all, this is not a priority.

ipld "gx/ipfs/QmZtNq8dArGfnpCZfx2pUNY7UcjGhVp5qqwQ4hH6mpTMRQ/go-ipld-format"
)

// Traversal provides methods to move through a DAG of IPLD nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this whole dagutils.Traversal system might make sense to live in a package inside the go-ipld-format repo? It's already so nicely separated from any other imports like unixfs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rereading #5304 (comment) I see that why's mentioning to actually move these type of packages there instead of go-merkledag.

License: MIT
Signed-off-by: Lucas Molas <schomatis@gmail.com>
@schomatis
Copy link
Contributor Author

After initial review this PR will need to be split in two: traversal.go to go-ipld-format and the rest of the DAG reader to go-unixfs.

License: MIT
Signed-off-by: Lucas Molas <schomatis@gmail.com>
@schomatis
Copy link
Contributor Author

@Mr0grog Just a reminder ping (but again, no hurry).

@Mr0grog
Copy link
Contributor

Mr0grog commented Aug 14, 2018

Ah! Sorry, was unclear on the status after the comment about splitting it up. Looking now.

Copy link
Contributor

@Mr0grog Mr0grog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The main separation here (tools to traverse a DAG vs. file-like operations on the DAG) made a lot of sense to me right away. 😍

Reading through the Traversal code, though, my first thought here was that it seems odd to assign the VisitHandler to the struct. Why not pass it as an argument to Iterate()?

err := traversal.Iterate(func(node ipld.Node) error {
    // visitor logic
})

That feels like the right place for it to me because the logic you are executing is inherently tied to this call to Iterate() or Search(). I may be too used to languages where you pass callbacks or continuations, though :P

I also feel like the use of Iterate() and Search() could be streamlined or even unified. My first assumption when I saw the two was that one might leverage the other, since I assumed their behavior ought to overlap a lot. What if the direction and progress of the traversal was controlled by the types of errors the visitor returns? (I’ve definitely seen this pattern before, and Go’s filepath.Walk(path, walkFunc) builtin that @warpfork mentioned does this, too.)

Reading through the DAG would be pretty similar to what you had:

err := traversal.Iterate(func(node ipld.Node) error {
    if len(node.Links()) == 0 {
        err = dr.saveNodeData(node)
        if err != nil {
            return err
        }

        n += dr.readDataBuffer(out[n:])

        if n == len(out) {
            // Return an error that causes the traversal to pause
            return dagutils.PauseIteration
            // or maybe
            // return traversal.Pause
        }
    }

    return nil
})

…but then seeking would be more streamlined, and doesn’t require a separate Search() function:

err = traversal.Iterate(func(node ipld.Node) error {
    if len(node.Links()) > 0 {
        // [Clipped: extract `fsNode` and make sure it has hints]

        i := 0
        for i; i < len(node.Links()); i++ {
            childSize := fsNode.BlockSize(i)
            if childSize < uint64(left) {
                left -= int64(childSize)
            } else {
                break
            }
        }
        // Return an error that causes the traversal to skip the next N nodes
        return dagutils.SkipNext(i)
    } else {
        // [Clipped: Buffer the data from the node at the seeked position]
        
        // Return an error that causes the traversal to pause
        return dagutils.PauseIteration
        // or maybe
        // return traversal.Pause
    }
})

Ultimately, you’re really moving a lot of the error handling and boiler plate from the user-supplied visitor function into the Iterate() function itself. I’m not sure whether it makes more sense for the error types/functions to live on the package (in dagutils) or directly on the struct (so you might call traversal.SkipNext(n)). I definitely don’t know what’s more common in Go.

// Flag to stop the current iteration, intended to be set by the user
// inside `VisitHandler` to stop the current traversal operation.
Stop bool
// TODO: Use a function?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on this TODO. Or the ideas I outlined in the review summary.

@@ -19,6 +22,10 @@ var (
ErrUnkownNodeType = errors.New("unknown node type")
)

// TODO: Rename the `DagReader` interface, this doesn't read *any* DAG, just
// DAGs with UnixFS node (and it *belongs* to the `unixfs` package). Some
// alternatives: `FileReader`, `UnixFSFileReader`, `UnixFSReader`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on this. I like UnixFSReader. (I dislike FileReader because I wouldn’t be sure if it was reading bytes of a file from disk or reading through a UnixFS file’s DAG.)

// `up`, `Right`) that modify its `path`. The `down` move method is the
// analogous of the recursive call and the one in charge of visiting
// the node (through the `VisitHandler`) and performing some user-defined
// logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right() being the only one of these that is public feels a bit odd to me, but I get why it’s here. See my summary comment for some alternative thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree. Maybe some other public method like SkipNode should be exported (that would internally call an un-exported right), but I feel that the graphic "turn right" movement is something that the user could grasp rather easily and it could relate more to that than a Skip function. Not really sure though, I usually lean towards abstracting but this felt like a natural "take look under the hood, you see? there's not much complex logic down there".

// by the current `level` to avoid including much indexing like
// `dt.path[dt.level]` in the code. Maybe another more strong refactoring
// that would allow to think of the current node in the `path` without
// concerning at what `level` it's in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there were definitely some points in the code where all the indexing got a bit muddy.

// this format.
childIndex []uint
// TODO: Revisit name, `childPosition`? Do not use the *link* term, put
// emphasis in the child (the *what*, not the *how* we get it).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name made plenty of sense for me. The only thing clearer that comes to mind is currentChildIndex.

// `Iterate()` and `Search`). This value should never be returned to after
// the first `down` movement, moving up from the root should always return
// `ErrUpOnRoot`.
level int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this depth? It seems like that would be much more straightforward, especially since your comments define it in terms of “depth” ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I avoided the term "depth" at first because I considered it an absolute property, "the depth of the entire DAG", and didn't want to give the sense that this property was being modified here. I could use the name currentDepth or positionDepth, or yes, just depth alone and indicate this is related more to the current position in the DAG than the DAG itself.

//
// Any of the exported methods of this API should be allowed to be called
// from within this method.
VisitHandler func(node ipld.Node) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might call this a Visitor. In fact, I think that’s generally the common term I’ve seen for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to re-use this across the go-ipfs codebase the Visit function should:

  • Receive the level of the visited node
  • Receive the parent(s) of the visited node

We probably want to provide a couple default Visit functions to wrap some commonly used logic:

  • One implementing full recursive traversal of the dag, pruning already visited branches
  • One implementing depth limited traversal, pruning already limited branches (if they are not going to be explored deeper).

var ErrDownNoChild = errors.New("can't go down, no child available")

// ErrUpOnRoot signals the end of the DAG after returning to the root.
var ErrUpOnRoot = errors.New("can't go up, already on root")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is a pretty important public signal that calling functions will see, but I feel like UpOnRoot doesn’t quite explain what this is really about (that you have traversed through the entire graph). Maybe EndOfDag or EndOfGraph or TraversalComplete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I feel like UpOnRoot doesn’t quite explain what this is really about (that you have traversed through the entire graph).

That is actually the case only for the Iterate() context, in future traversal operations that error may be interpreted differently, but you're right that Iterate() should wrap this error (which seems more like an internal error) and return something more meaningful like EndOfDag or IterateComplete.

var ErrRightNoChild = errors.New("can't move right, no more child nodes in this parent")

// errDownStopIteration signals the stop of the traversal operation.
var errDownStopIteration = errors.New("stop")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Pause would be better than Stop here — in my mind, stopping means you can’t continue from where you left off. With this code, though, you can (and that’s important, because that’s how seeking works).


// NewDagReaderWithSize constructs a new `dagReader` with the file `size`
// specified.
func NewDagReaderWithSize(ctx context.Context, n ipld.Node, serv ipld.NodeGetter, size uint64) (DagReader, error) {
Copy link
Contributor

@Mr0grog Mr0grog Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a little unclear to me at first that this was the precalculated size of the whole file/DAG — I thought it might be the maximum amount of data to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was an ugly patch I had to introduce to preserve the Size() API, I'll try to document it more clearly.

@schomatis
Copy link
Contributor Author

@Mr0grog Another great review as usual, thanks. I'll be bothering you with some follow-up questions in the coming days.

@Mr0grog
Copy link
Contributor

Mr0grog commented Aug 16, 2018

I’ll bet this will be really useful for simplifying the traversal logic in the new depth-limited ipfs refs -r work in #5337, too.

@schomatis
Copy link
Contributor Author

@Mr0grog Regarding the Iterate/Search unification, indeed there's a strong similarity between them, the first contains the second: Search just goes down while Iterate when it reaches a leaf node it goes back up again to go down in another direction.

You can have Iterate and just issue a stop when you reached the first leaf node, and even though that wouldn't really be a big burden to pass to the user, my main motivation for using two different functions is that conceptually I see both as clearly different operations which the user can relate to both of them. (This is subjective of course, I would like to hear how you see them.)

Even if internally they are implemented with very similar (almost the same code, actually) my main question would be: what is the advantage, from the user's perspective, to unify them and present them only as Iterate? Do you see the difference between both operations as a potential source of confusion to the user? If I unify them and the user issues a stop in the first leaf node it visits during an iteration, what would be the conceptual model he would be think of (different from a search that reached the end of the road)?

level: -1,
// Starting position, "on top" of the root node, see `level`.

ctx: ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should avoid storing the context in the struct, it's generally considered a bad practice.

Copy link
Contributor Author

@schomatis schomatis Aug 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality was inherited from the PBDagReader which stored the context to later be able to call ipld.GetNodes. How would you advise this be handled?


// The CID of each child of each node in the `path` (indexed by
// `level` and `childIndex`).
childCIDs [][]*cid.Cid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's gonna be a lot of memory, as is the promise set below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also inherited from the PBDagReader (I'm glad this refactoring exposed some of the already existing characteristics of the current implementation) which has the

	// NodePromises for each of 'nodes' child links
	promises []*ipld.NodePromise

	// the cid of each child of the current node
	links []*cid.Cid

slices, each of those is instantiated in every level of the DAG when the recursive algorithm creates a new PBDagReader.

lot of memory

Have in mind that level relates to the depth of the DAG (I should change its name as suggested above), so the size of this matrix grows at an O(log n) rate with the size of the DAG.

How would you advise this be handled?

@hsanjuan
Copy link
Contributor

I’ll bet this will be really useful for simplifying the traversal logic in the new depth-limited ipfs refs -r work in #5337, too.

It's not so much that the logic is complex (it's way shorter than this), but that there are multiple places in the code base making use of very similar logic, with very small differences. It would be good if this could replace one or several of those instances, rather than becoming another way of doing the same.

On another note, it seems that this cannot do "async" (or multithreading) DAG fetching, which would be very good to have. Without it, it probably cannot replace the logic used for fetching blocks when pinning.

@schomatis
Copy link
Contributor Author

@hsanjuan Thanks for the feedback. I'm going to leave the async feature for a second iteration though (I'll open an issue once I get this merged in go-ipld-format) to avoid adding more complexity to the code so I can focus now on being sure this actually works for the DAG reader before using it elsewhere.

@hsanjuan
Copy link
Contributor

@schomatis yeah, that's the most sensible approach. Let's iterate on it later.

@Mr0grog
Copy link
Contributor

Mr0grog commented Aug 21, 2018

my main question would be: what is the advantage, from the user's perspective, to unify them and present them only as Iterate?

Two reasons:

  1. Once I tried to rewrite the VisitHandler you are using in DagReader with this, it didn’t seem like a specialized Search function actually added much.

  2. There are way more than two kinds of operations you might want to do here :)

    The limited-depth traversal @hsanjuan is working on is a great example of this — once it hits its max depth, it still needs to continue traversing across the graph. Unlike the current Iterate, it needs more control over how the traversal moves through the graph. But unlike Search, it needs to go to more than one place. That should be straightforward with my fancypants Iterate:

    err := traversal.Iterate(func(node ipld.Node) error {
        // Do whatever work needs to be done on this node
    
        if (traversal.level >= maxDepth) {
            return dagutils.SkipNext(len(node.Links()))
        }
        return nil
    })

    I imagine there are a lot more other ways you’d want to move across the graph in different situations, so I tend towards an API that can support a variety of cases. (Also, I think @hsanjuan made excellent point about additional arguments the visitor should take if we want it to truly be generic and useful.) I do think there’s still plenty of room for more specialized functions that basically just wrap Iterate, though.

    If want some good examples of different walk/traversal approaches, I think the TreeWalker browser API is nice one (but with a lot of features we might not need yet). It’s more like a cursor that you move where you want, but you can see really easily how you’d build your Iterate and Search on top of it.

I feel that the graphic "turn right" movement is something that the user could grasp rather easily and it could relate more to that than a Skip function.

I always think of internationalization and visual layout functions when it comes to stuff like this. Lots of people have written code focused on aligning things to the “left” or “right” only to realize later when they internationalized for right-to-left languages that their life would have been much simpler if they’d aligned to the “start” and “end” instead. So I usually try to avoid APIs that focus on how I visualize a model and opt for ones that talk more about the properties of the model itself.

Not really sure though, I usually lean towards abstracting but this felt like a natural "take look under the hood, you see? there's not much complex logic down there".

I like that sentiment a lot. The reason I tend towards returning a command for the iterator to execute here is because I think it’s generally safer to keep control in one place. When the iterator function and the visitor are both messing with the Traversal struct’s state, it might be easy to accidentally wind up in a bad situation. (Even better if we can come up with an approach that satisfies both!)


Side note: are you planning to shift towards “walk” terminology? I stuck with Traversal and Iterate because I didn’t want to make my suggestions harder to understand by flipping terminology at the same time, but should we now be talking about a Walker stuct (instead of Traversal) with a Walk function (instead of Iterate)?

@schomatis
Copy link
Contributor Author

There are way more than two kinds of operations you might want to do here :)

Sorry, I think I didn't express myself correctly before, I'm definitely not saying that iterate/search cover every use case (the did cover the necessity of the DAG reader that was my test case used to iterate this idea), my motivation to have them both is to avoid passing that burden to the user, that is, instead of forcing every user-defined Visit that wants to do a search to include the check "if on leaf node stop", this structure should provide that functionality (according to the DRY principle).

The same logic applies for future operations that may be needed: if many parts of the code need (for example) an iterator that goes up and down twice this structure should provide a different operation, e.g., iterateTwice, instead of having many parts of the code base repeating the same logic. Of course you can lay all that functionality on top of Iterate but we should be evaluating how much of that logic should this structure absorb (in the form of traversal operations added to the API) and how much should be left to the user (e.g., if the user needs a iterator that goes 3 1/2 times around the DAG, then yes, that's on him, the API shouldn't be modified).

I do think there’s still plenty of room for more specialized functions that basically just wrap Iterate, though.

Exactly, but if those function do something conceptually different than iterate I would want to analyze if I need to be providing a different traversal operation here (that was the motivation to add Search in the first place).

I'm interpreting @hsanjuan comment in this same sense (but I may be missing something),

We probably want to provide a couple default Visit functions to wrap some commonly used logic:

to provide different operations (like Search) to avoid asking the user to do this himself, like the depth check in your code:

err := traversal.Iterate(func(node ipld.Node) error {
    // Do whatever work needs to be done on this node

    if (traversal.level >= maxDepth) {
        return dagutils.SkipNext(len(node.Links()))
    }
    return nil
})

Side note: are you planning to shift towards “walk” terminology?

Yes, I'll switch it once I move this to the other repos.

@Mr0grog
Copy link
Contributor

Mr0grog commented Aug 21, 2018

instead of forcing every user-defined Visit that wants to do a search to include the check "if on leaf node stop", this structure should provide that functionality

I was trying address this in my first point. Here’s seeking with my version of Iterate:

err = traversal.Iterate(func(node ipld.Node) error {
    if len(node.Links()) > 0 {
        // Figure out which child to skip to
        // Return an error that causes the traversal to skip the next N nodes
        return dagutils.SkipNext(i)
    } else {
        // Handle the found node
        // Return an error that causes the traversal to pause
        return dagutils.PauseIteration()
    }
})

…and here it is with Search:

err = traversal.Search(func(node ipld.Node) error {
    if len(node.Links()) > 0 {
        // Figure out which child to skip to
        // Return an error that causes the traversal to skip the next N nodes
        return dagutils.SkipNext(i)
    } else {
        // Handle the found node
        return nil
    }
})

Which just didn’t feel different enough to be worth making a special case for.

But you’re right; if we changed the API to Search, we could do something like this, that might be conceptually clearer (at the cost of more error handling code):

// This returns the found node, which is slightly different than what we had before
func (dt *Traversal) Search(visitor Visitor) (ipld.Node, error) {
    var foundNode ipld.Node
    err := dt.Iterate(func(node ipld.Node) error {
        if len(node.Links()) == 0 {
            foundNode = node
            return dagutils.PauseIteration()
        }
        return visitor(node)
    })
    return foundNode, err
}

// Different contract for search simplifies things, sort of (more error handling,
// but might be conceptually clearer)
foundNode, err = traversal.Search(func(node ipld.Node) error {
    // Figure out which child to skip to
    // Return an error that causes the traversal to skip the next N nodes
    return dagutils.SkipNext(i)
})
if err != nil {
    // handle error
} else {
    // handle the found node
}

(Side note: I think I would rename this, though — I would expect Search to keep visiting nodes until I found the criteria I wanted. So I might Search for the string "hello" in the DAG representing a text file, but I would Seek down a single path to a leaf.)

if many parts of the code need (for example) an iterator that goes up and down twice this structure should provide a different operation, e.g., iterateTwice, instead of having many parts of the code base repeating the same logic.

Totally agree!

We probably want to provide a couple default Visit functions to wrap some commonly used logic:

to provide different operations (like Search) to avoid asking the user to do this himself, like the depth check in your code:

I was assuming this specific case isn’t that common. Since there’s so little extra to add here, it didn’t seem worthwhile if only using it once or twice. But otherwise: HELL YES ✅

@schomatis
Copy link
Contributor Author

Thanks @Mr0grog and everyone for all the feedback!! (I like the Seek proposal.)

I'll be closing this now and start moving it to the corresponding repos, incrementally incorporating these changes and bothering each feedbacker (not sure if that word exists :) to see if the new code correctly reflects your input.

@schomatis schomatis closed this Aug 21, 2018
@ghost ghost removed the status/in-progress In progress label Aug 21, 2018
@schomatis
Copy link
Contributor Author

I don't think Why is going to have time to review his comments (1, 2) so I'm bothering you @Stebalien just to check if you'd like to have any modifications there.

@Mr0grog
Copy link
Contributor

Mr0grog commented Aug 21, 2018

In the mean time, maybe worthwhile to mark those with // TODO: or // FIXME: or // PERF: comments about investigating was to reduce memory usage? That way, there are clear places for someone who wants to do optimization work to jump in.

@schomatis
Copy link
Contributor Author

That's a good idea but I would like to postpone it at the moment, the only place to do some actual optimizations is in the logic around node promises which at the moment I'm not sure if I have implemented them correctly so I don't want to waste anybody's time optimizing something that may be changed in the near future (note to self: bother @Stebalien to take a quick look at 694c9a2 to see if it makes sense).

@hannahhoward
Copy link
Contributor

I think this is super cool and I agree that replicating dag traversal logic in a million places is awkward and it'd be nice to have a generalized version.

My comments generally match the previous comments:

  1. This is not ideal for super high speed/parallism, but that can be left for a future iteration (though it's a little tricky cause then you effectively need to walk in multiple directions at once)

  2. I agree that Search is a bit nebulous, because the description of the function doesn't clarify that it effectively leaves it to the Visitor to walk right to the appropriate node (I am inferring that but it took a close read of the code)

All things being equal though, pending @Stebalien 's comments I think this is good to merge for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/UnixFS Topic UnixFS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Analyzing the DAG reader
6 participants