-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add swarm protocol code #73
Comments
Is this a hash function or a transport protocol? |
@Stebalien Swarm is a content addressed store, just like ipfs. |
Yes, but which parts need multicodecs for what. That is:
It looks like you need an IPLD format but I'm not sure. |
We need a multiaddr protocol, so we can address, eg, |
To clarify, our end goal is to have a content field that can contain either a swarm hash or an IPFS hash (or other related content identifiers). If multiaddr isn't the appropriate way to do that, please let us know. |
We generally distinguish between names and addresses. Addresses generally tell you where to look for some resource (multiaddr), names generally tell you what resource to look for (peer ID, content ID). So, a multiaddr in an ENS record would address some endpoint (network or otherwise), not a piece of content. To name a piece of content, you'd probably just want a free-form path. For example, (One of the motivations in switching away from using |
Is there a multiformat suitable for encoding an ipfs hash or path? Storing text isn't ideal due to the high cost of onchain storage. |
It looks like multicodec might be suitable? Would it make sense to add ipfs and swarm multihash types to the public multicodec table? |
I filed multiformats/multicodec#94 and multiformats/multicodec#95 for these ^ |
@Stebalien can you take ownership of these issues for ENS? |
@jbenet I'll see what I can do but this will need quite a bit of consensus. @Arachnid multicodec is really just a common table of unique codes. Other multiformats then use these codes but multicodec doesn't really describe a generalized path spec. I think what you're looking for here are CIDs. CIDs are It looks like the correct path forward here is to:
However, there are two drawbacks to 3:
To handle this (and unify our path concepts) we could introduce a new generalized "multipath" spec, subsuming multiaddr. This spec would define a universal pathing system, a global conflict free namespace (basically, the string version of the multicodec table), and a "compact" path form. This would basically just be taking the current multiaddr spec as-is and extending it slightly to say "and you can use this to address content as well". Thoughts on this? |
That sounds right. What's the value of the multicodec field, though? Or are you saying it's a multicodec type prefix, followed by multihash content? Isn't that what we're already proposing? |
Okay, I see I misunderstood multicodec - it's only a prefix, the spec doesn't include the data. You list one disadvantage as having to add a new field to ENS, but we're already adding a field to ENS for content hashes; this discussion is trying to determine what it will look like. I assume you're not saying we'd have to add another field in addition to that. I don't think there can be an option that doesn't require adding a field to ENS? I am a little confused, though, because I don't see a multicodec value for IPFS content-hashes anywhere in the table. What value do CIDs use? |
Yes. Well, specifically, it's:
Yes. Well, really, the term "multicodec" is a bit overloaded. It refers both to the codec itself and formats of the form
Ah. Sorry, I thought multiaddr support in ENS was already a done deal (the EIP was merged). My point was that we could try to unify multiaddr and other paths into a master path spec if that were the case. If not, and if you just need to be able to reference content, CIDs are the way to go.
TL;DR: There is no "IPFS" multicodec.
So, how would this work for swarm? We'd define a new multicodec for swarm and then, in ENS, you'd use CIDs of the form |
Can we reasonably omit the multibase prefix if we specify it'll always be in binary format?
It's merged, but still only a draft. I think we'd rather harmonise now rather than deprecate later.
I think that's what we want. Are you implying the existence of other, more flexible options, though?
Hm, that seems like a problem for this approach. We need to be able to look at the data in the ENS record and know which distributed storage system to query (and what identifier to use to query it); it sound like that's not going to be doable as-is, since the CID metadata doesn't describe the storage system, just the stored data? |
Yes, sort of. See: multiformats/cid#28 Basically, you can always turn a CID into text (well, the EIP doesn't have to specify how to do this but the CID spec does). However (while some may say otherwise...) there's no reason to waste the byte if the encoding can't be anything other than "raw bytes".
Got it.
Not at the moment, no. My point is that CIDs address content by hash and that's all. However, multiaddrs address network endpoints. If you need something that does both, CIDs won't cut it.
So, in this case, it shouldn't actually be an issue. Given that you only support one data format, you can just say "if it's a swarm object, look it up in swarm". More generally, I'd be careful about bundling content addressing with location addressing. For example, what if the same data is available through multiple storage systems? What if you need to migrate from one to another? Really, you almost want something like a magnet/meta link. That is a CID along with some description of where the content might be found. I'm not familiar enough with ENS to give concrete suggestions but I'd consider a separate field with location hints. Alternatively, we could try to come up with some way to bundle location hints with a CID but I'm not really sure about the best way to do that (usually, I'd pass those hints along out-of-band, e.g., in a separate field). |
https://github.com/ipld/cid#cidv1 specifies:
From
@Arachnid Is the plan to have multiple entries in ENS, one for each underlying "codec"? In this case, wouldn't referencing the same content in ipfs (codec 0x01a5) and swarm (if swarm codec was 0x0622) just two records comprising for example:
|
That implies "and everything else should be looked up in IPFS", though, which isn't either very neutral or very extensible.
We want to store an identifier sufficient for the end user to fetch the content. While I recognise that "this is the content hash" is distinct from "and here is the system to look for it in", realistically due to different systems having different methods of hashing, chunking, and building trees, the chances of having the same content hash accessible in different systems seems low-to-nil. I think it makes the most sense to combine content hash and location metadata together into a single identifier for that reason. I'm afraid this still leaves me in the dark as to what the best solution is, however - everything proposed so far seems to have significant issues. |
Not if one uses CIDs. You'd have one identifying the content and another hinting at where to find the content. I guess you could also have multiple CIDs (for "alternative" versions of the content).
Not really. An application wishing to resolve CIDs to data would use a pluggable (parallel and/or hierarchical) resolver. Ideally, it would have:
Now yes, the data exchange protocol IPFS uses (bitswap) can fetch arbitrary blocks, but that's just because it's a general-purpose data exchange protocol. Really, the issue here is that CIDs are entirely neutral. They don't say anything about how the data should be retrieved (although this can sometimes be inferred from the type).
If Swarm gains traction, the chances are pretty high: go-ipfs will almost certainly get a plugin for resolving Swarm CIDs with Swarm. Once fetched, the data would be cached in the local IPFS datastore and made available over bitswap. This is the entire point of IPLD (CIDs are a part of the IPLD spec): interoperability between merkledag systems. However, I do agree that hints indicating where content can likely be found is important for performance. On the other hand, I'm still not convinced bundling location hints with content identity is a good idea. |
but that would be multiple "anothers" if there are multiple locations, right? (IPFS, Swarm...)
Maybe this inference is enough for ENS as a case? |
We'd only need one field type. If ENS is like DNS, you'd repeat the field once per system. Alternatively, you could encode a list of systems.
Maybe? For swarm, it should be. Also, a "location" hint can always be added after the fact as a new (optional) field if that becomes an issue.
On second thought, I can see a reason to bundle these if ENS supports multiple "alternative" records like DNS does. That is, can I have:
Where the client should pick the first supported data source? If so, then it make sense to bind these location hints to the records themselves (although I'm not sure what the best way to do this is). |
If CIDs don't encode information on where to get content, what makes something a Swarm CID, and how does IPFS know how to fetch it from swarm?
How does a CID hint at where to find the content? Also, if CIDs don't do that, then what's the purpose of storing a CID in ENS over just a multihash? |
Swarm uses a custom merkledag format (Swarm-Hash?). CIDs pointing to swarm content would be
Sorry, one record/field identifying the content (with a CID) and one (or more) record(s)/field(s) indicating where content related to the ENS name can be found.
CIDs tell you how to interpret the referenced merkledag. A bare multihash is sufficient to identify the content but it doesn't tell you if the content is just a raw binary object, a swarm merkletree, a git object, an ethereum block, etc. |
I really think we must be talking at cross-purposes here. As I understand it, your position is that identifiers should be purely content identifiers, and shouldn't integrate information about the location of that content. Is that correct? But at the same time, you seem to be suggesting using metadata about the content identifiers, like what sort of hashing they use, to identify where to find the data. This seems like it has the same effect as including hints about content location, but less reliably, since it's possible that there could be multiple storage locations for a single content hash. What am I misunderstanding? Can you give a concrete example of what you think an ENS record pointing to a resource that can be either IPFS or Swarm would look like?
What do you mean "content related to the ENS name"? The goal isn't to store ENS information in Swarm or IPFS, it's to point to Swarm and IPFS resources from ENS.
So, what is the canonical format for an IPFS identifier, such as those that users enter into their browsers today? Is it a multihash, or a CID? Isn't any of this metadata stored with the actual IPFS object? |
A multicodec has been added in multiformats/multicodec#104 but it's not currently a "multiaddr" codec. However, given that multiaddrs are defined as |
Currently there does not exist a protocol code for Ethereum's Swarm content hashes (https://swarm-guide.readthedocs.io/en/latest/usage.html#bzz-url-schemes).
This makes it difficult to use multiaddr in Ethereum Improvement Proposal #1577 (ethereum/EIPs#1577).
The text was updated successfully, but these errors were encountered: