-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding concatenated data items #75
Comments
Yeah, I don't mind doing this as an opt-in option. The default should be strictly no trailing bytes but there's reasonable cases for allowing the decoder to find the boundaries. In fact just 4 days ago I added support for this exact kind of case in the Go dag-cbor implementation: ipld/go-ipld-prime#490 ! It was a bit easier there though because the decoder is prepared for it. We have a problem here of the tokeniser taking a fully materialised One solution is to make an interface that has just enough functionality that can work over iterables or plain |
Ah nice, it sounds like this has been on your mind recently, then! It would be useful to have an interface around async iterables for generic streaming, but at least in my case today a synchronous interface would do the trick. I hold all the bytes for some unknown number of concatenated data items, and I'm looking to decode each of them. I am not 100% sure about the approach with the |
Well, if you have all the bytes, then that certainly makes it easier. The main barrier is the erroring if the tokeniser isn't Lines 189 to 191 in 7f8bcee
adjusting this behaviour with a DontParseBeyondEnd option in Go (https://github.com/ipld/go-ipld-prime/blob/200b4a6b6fb6720911cb385aff05da55c39d56de/codec/dagcbor/unmarshal.go#LL64C2-L64C20), "Parse" meaning "look for more bytes and error if there are any".
Essentially what you probably want to do is parse a chunk, get the object and the byte length ( Some options:
|
BREAKING CHANGE Implementations of `Tokenizer` must now implement a pos() method to be compatible. This should only impact advanced users of cborg. Ref: #75
I'm proposing a feature that should suit this, at least in the non-async case where you think you have enough bytes to find a cbor chunk at the start of it: #93 |
BREAKING CHANGE Implementations of `Tokenizer` must now implement a pos() method to be compatible. This should only impact advanced users of cborg. Ref: #75
Feel free to reopen or open an new issue if you want to outline what else might be needed beyond |
This is great. Actually for our current use-case I think this will do the trick just fine. If we happen to run into the async use-case, we'll open an issue with some details. Appreciate it! |
It would be convenient to be able to decode concatenated data items, which has applications in streaming settings as is suggested in the streaming section (link) of the spec:
If you are interested in catering to the use-case but don't want to add a new API for it, one light way to support this might be to make the default
Tokeniser
class public, which I expect would make it simpler for folks to add this support themselves as needed. There are also precedents for libraries supporting this through a streaming interface or a separate method specifically for decoding multiple data items.The text was updated successfully, but these errors were encountered: