-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Reset function to binaryReader #196
Conversation
Reset improves reader performance when the same binary data is processed multiple times. It works by reusing the symbol table constructed during the first read, and jumping straight to the first "content" byte when Reset is called.
// reuse a binaryReader with inconsistent input bytes will cause the reader | ||
// to return errors, misappropriate values to unrelated or non-existent | ||
// attributes, or incorrectly parse data values. | ||
func (r *binaryReader) Reset(in []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, can you add a note about the visibility of this function for those of us who aren't using Go on a regular basis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is exported, so anyone can call it. It's not included in the Reader
interface, which is the return type from the NewReader*
functions. In order to use this, one needs to define their own Go interface similar to this:
type resettableReader interface {
ion.Reader
Reset([]byte) error
}
and perform a type assertion on the interface in order to use it. For example:
func resetOrNew(reader ion.Reader, content []byte) ion.Reader {
if rr, ok := reader.(resettableReader); ok {
_ = rr.Reset(content)
return rr
}
return ion.NewReaderBytes(content)
}
@nirosys, Can you take a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey jt0! Thanks for the PR.
Could you maybe provide some more information about your use case, and what prompted you to look into this optimization?
I have some concerns about the approach, but I might be missing something.
// Reset causes the binary reader to start reading from the given input bytes | ||
// while skipping most of the initialization steps needed to prepare the | ||
// reader. Reset is most commonly called with the same bytes as the reader | ||
// was originally created with (e.g. via NewReaderBytes) as an optimization | ||
// when the same data needs to be read multiple times. | ||
// | ||
// While it is possible to call Reset with different input bytes, the Reader | ||
// will only work correctly if the new bytes contain the exact same binary | ||
// version marker and local symbols as the original input. If there are any | ||
// doubts whether this is the case, it is instead recommended to create a | ||
// new Reader using NewReaderBytes (or NewReaderCat) instead. Attempting to | ||
// reuse a binaryReader with inconsistent input bytes will cause the reader | ||
// to return errors, misappropriate values to unrelated or non-existent | ||
// attributes, or incorrectly parse data values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about this implementation. In ion-c we have APIs to reset the underlying stream of a reader, which includes resetting the catalog and symbol table to default values. This Reset
would only swap the underlying bytes which feels, to me, less than a reset.
That it doesn't work "correctly" unless you provide bytes that have the exact same leading bytes also concerns me. I feel like the method would be very error prone, and I question the re-usability of it given that an Ion reader can be created with anything that implements io.Read
, but this functionality is limited to data already read into memory.
The optimization seems very application specific, which leads me to think that if the application had the right information (catalog, and reader offset), it could re-create the reader itself. Maybe that would be a better approach? By providing access to the right data, the application could optimize re-construction of the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ion-c we have APIs to reset the underlying stream of a reader, which includes resetting the catalog and symbol table to default values.
In ion-go, you achieve this semantic by calling ion.NewReaderBytes
or ion.NewReader
with an io.Reader that has it's own reset capability.
That it doesn't work "correctly" unless you provide bytes that have the exact same leading bytes also concerns me. I feel like the method would be very error prone
I tried to be clear that one has to be careful to only provide consistent bytes. Tbh, I don't expect that would be a common use case. Happy to update the comment if it needs to be more explicit.
which leads me to think that if the application had the right information (catalog, and reader offset), it could re-create the reader itself.
Not as the library is currently written, since neither binaryReader's nor its attributes are exported, and there are no "Set" functions provided. Even if those were added, I'm not sure it would be any better than having a function like Reset that manages the details for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion, I'm good with merging the change, with the understanding that we'll revisit the use case at a later date.
Reset improves reader performance when the same binary data is processed multiple times. It works by reusing the symbol table constructed during the first read, and jumping straight to the first "content" byte when Reset is called.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.