Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose traversal-based car creation #269

Merged
merged 10 commits into from
Nov 30, 2021
Merged

Propose traversal-based car creation #269

merged 10 commits into from
Nov 30, 2021

Conversation

willscott
Copy link
Member

@willscott willscott commented Nov 22, 2021

  • Complete the file-based alternative writes with no-precalculated size when a file is provided, and then goes pack to fix up the size of the data in the header at the end.
  • include carv1 stream support
  • Handle options being set on the traversal (repeated blocks)
  • Handle options being set for the car (type of index, etc.)
  • Roundtrip test

@willscott willscott requested a review from masih November 23, 2021 08:23
v2/selective.go Outdated
"github.com/multiformats/go-varint"
)

// PrepareTraversal walks through the proposed dag traversal to learn it's total size in order to be able to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lotus evolved away from needing this, so I don't think there's a consumer for prepare+dump anymore, it's just write now, thankfully. There used to be a need to get the size up-front to set up the commp calculation with padding, but that's not necessary anymore. So you could decomplicate this by removing this functionality if you like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to write out to a stream - e.g. network, this 2x scan would still be more efficient than having to write it out to a file, touch up the header, and then as a second step send the whole thing, i think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so this would be for a CARv2 format and you need the offset and all that? Because for a selector-based CARv1 you have the root already, or else you can't run the traversal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is to get the data size field correct in the carv2 header when streaming.

I can make a carv1-only stream version that can do it in one pass as well. I suppose that's probably useful as well.

This comment was marked as resolved.

@willscott
Copy link
Member Author

I think i have all the code here doing what I want. starting in on tests.

This would be a great time for reviews if you want to request changes to naming

@willscott willscott marked this pull request as ready for review November 27, 2021 02:01
DecoderChooser: ls.DecoderChooser,
HasherChooser: ls.HasherChooser,
StorageWriteOpener: ls.StorageWriteOpener,
StorageReadOpener: func(lc linking.LinkContext, l ipld.Link) (io.Reader, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm not gonna try to steer here, this is more a question for me to understand how to serve this area better in the future: Did you go through a decision process on whether to do this wrap StorageReadOpener approach vs do some wrappers around the new storage.* APIs?

Is this wrapping strategy just familiar already?

Is this wrapping strategy preferable because it seems like the most broadly applicable place to intercept things?

Is this wrapping strategy preferable because it's clear how it composes with other StorageReadOpener magic, such as how graphsync uses it to hook block load events?

Is it because figuring out how to wrap the storage.* APIs is daunting because it would've required thinking about all the places feature detection might try kick in?

Other?

(These are not blocking questions; they just could be useful info for understanding the value of any future iterations on storage and link loading APIs and other possible event callbacks around them.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's that the object that is expected to be passed around is a LinkSystem - that linksystem will already have the storage.* object applied to it and turned into a StorageReadOpener, and in this library I won't know what the underlying readable storage was to be able to wrap it. If i assume the right thing to take in is a link system, which seems right because i need to care about things like 'should i hash reads', 'what reifiers are present', etc, then this is the place i can intercept block reads

v2/index/index.go Show resolved Hide resolved
v2/internal/loader/counting_loader.go Show resolved Hide resolved
v2/index/index.go Outdated Show resolved Hide resolved
v2/selective.go Outdated
"github.com/multiformats/go-varint"
)

// PrepareTraversal walks through the proposed dag traversal to learn it's total size in order to be able to

This comment was marked as resolved.

v2/selective_test.go Outdated Show resolved Hide resolved
v2/selective.go Outdated Show resolved Hide resolved
v2/selective.go Outdated Show resolved Hide resolved
v2/selective.go Outdated Show resolved Hide resolved
v2/selective.go Show resolved Hide resolved
return int64(n), err
}

h := NewHeader(tc.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud, in a separate PR we probably should change NewHeader to take options and encapsulate the gymnastics needed to set correct header values.

@willscott willscott requested a review from masih November 29, 2021 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants