-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add likecoin-iscn codec #200
base: master
Are you sure you want to change the base?
Conversation
Thanks @Aludirk, could you help me with understanding this just a little bit more please? What is the intended use-case for the codes, I see you've used a new tag in the column, so is it right to assume these are not "codecs" in the IPLD sense? Or are these referencing binary blob formats that get encoded and decoded in some unique way (each one having unique rules)? |
Thanks for the reply @rvagg The ISCN is some kind of ISBN for digital content and is suitable for us to use as the schema of the digital content metadata. Meanwhile, we want to let the public can easily access the metadata inside the blockchain (since a technical background is required for getting data from the chain directly, it is not user friendly). Using IPFS as a distribution method which will be easy for the users to find the digital content metadata by type an ipfs command only. Therefore, we translate our chain data to IPLD and pin it for users to access every time we have a new/updated digital content metadata. The reason why we are using eight codecs is that we want to mimic the ISCN data model as mentioned before. Each codec is corresponding to a specific ISCN schema which has different properties and validation rule. |
From a quick look at the source code, it looks like you're encoding your data as DAG-CBOR (please note that I have almost no Go knowledge). Is the idea that you want the type you've encoded as part of the CID? |
Yes, every schema will use canonical CBOR provided by IPFS for encoding and choose an appropriate decoder for decoding based on the codec extracted from the CID. |
@Aludirk the There are several ways doing that. You could wrap your whole object in something like: {
"type": "one of your types",
"data": {
// The actual payload
}
} Or, when I look at your schema descriptions it looks like they already have a As I talk about schemas already, you might also be interest to have a look into IPLD Schemas which could help with differentiating the data as well: https://specs.ipld.io/schemas/ |
ok, I got it, for the data structure, I can make the However, I still can't directly use DAG-CBOR for the codec. Since our data is actually stored inside a CosmosSDK based public blockchain called LikeCoin chain. When anyone tries to ask for the content by the CID, our daemon has a datastore plugin to find the concrete data inside the chain and answer the request. If there has no clue in the CID to let the chain to determine whether should try to get data from the chain itself, this plugin will not work. And by the performance consideration, I cannot let the datastore plugin try to access the chain for every DAG-CBOR CID. Therefore, I suggest that I reserve only one codec for the ISCN IPLD, so that our datastore plugin can work as the expectation. I can make a new commit if this suggestion is ok. |
@Aludirk sorry for the large info dump below but figured it might be helpful to put out some thoughts + ideas on your go-ipfs integration strategy. There are office hours later today listed on the community calendar if you'd like to talk a bit in person, of course talking here or on Matrix/IRC also works. TLDR: I think there are a few ways to tackle this problem, I don't think this approach (adding a likecoin codec into CIDs) is ideal if we can avoid it but I'd like to know if any of the existing strategies might work. High level thoughts on getting ISCN data via IPFSAt a high level IPFS wants to do content routing which means finding data based on what it is and not where it is. This scheme you've proposed attaches the location where content should be looked up (i.e. the likecoin blockchain) to the data itself by embedding it in the CID. For example, maybe some of your users want to fetch the data via Bitswap from each other so that they can resolve ISCN CIDs offline. It sounds like what you're really looking for here is "routing hints", i.e. a way to specify "I'm looking for CID bafyabc..., try looking for it in locations A, B, or C since they're likely to have it". I've heard this talked about before across a number of issues (e.g. [here](e.g. here), but I can't find a definitive one off hand (perhaps this means we need to make a new one, unless there's an issue I've missed @Stebalien). Overall though the issue of finding a CID via multiple possible systems really only has two solutions:
Questions/Thoughts about how you're planning on building this and utilizing go-ipfsI'd really like to have something that handles option 1 nicely, but that might take time to design, get built and make its way into systems such as IPFS and I wouldn't want your team to be held up by that. Therefore, my thoughts + questions below are to explore whether option 2 could work for you and poke a little bit more at your setup. The way go-ipfs normally works is that when searching for data it will (in order):
By inserting yourself at the datastore level you're avoiding steps 2 + 3 which could also help you find the data. What you might want to do instead is something like one of:
Do you mind going into this a little bit more:
Both your approach and option A above require making extra calls to the blockchain node. For example, every time a go-ipfs node tries to look for a non-likecoin CID it can't find anywhere else it'll ask the blockchain node where to find it. Is asking the blockchain node "do you have CID bafyabc?" very expensive and is option B similarly too expensive/implausible? Implementation notes on likecoindsIf I understand your setup in https://github.com/likecoin/likecoin-ipfs-cosmosds correctly then you're doing a GET on the blockchain any time the data is asked for instead of checking if you already have it stored locally. You're also using levelDB instead of the recommended BadgerDB or FlatFS datastores, you could probably just wrap one of those datastores instead of copying them internally. |
Hi this is Chung, one of the developer of ISCN. To be honest we are not very familiar with the mechanism of IPFS, so I think it is better to write down our idea here, to see if we have made anything wrong. Our idea is to have blockchain nodes to run a go-ipfs process / thread, with the datastore plugin installed. When the blockchain node receive transactions for adding new ISCN data, it notifies the go-ipfs process / thread to pin the associated CID (not implemented yet), so by IPFS mechanisms (I only know about DHT, but I think this part is handled by the nodes internally, so by calling the pin API I should not need to care?), other IPFS nodes on the IPFS network will be able to know that these CIDs could be retrieved from this IPFS process / thread. When some node wants to retrieve the CID, the IPFS process / thread calls the datastore plugin. The current design of the datastore plugin is first distinguish if the CID is for ISCN data, and if not, then proxy the call to other datastore plugins. Currently we copied leveldb datastore, which is for proof-of-concept, we are going to make it a wrapper around other existing plugins, just like what you suggested. If the CID is for ISCN data, then it will proxy the call to the blockchain node through RPC, which we implemented The above is the design based on our understanding. I hope that we didn't made any huge design flaw because of misunderstanding. Back to the IPLD codec. I agree that we should not occupy a codec type just for routing hint purpose, and I think for this part we would be able to workaround (e.g. storing Bloom filter of existing CIDs in the datastore plugin) if we don't have the codec type. But I want to know that is the codec type simply for serialization and deserialization? i.e. If we have our new data type (ISCN), which the binary storage type is JSON / CBOR / other existing codec, while we have added more semantic and verifications on that, in this case may we own a new codec type for this new data type? Also I would like to have an estimation that for a typical IPFS node on the public network, how often would it receive queries which needs the datastore plugin to handle? My impression when doing the proof-of-concept (from console log) is around 1 per second, but I'm not sure if this is a typical case. |
We are going to implement the International Standard Content Number Specification by using IPLD as our linked data structure, it is a global identifier for the digital content.
The IPLD plugin for this implementation is here.