Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Reset function to binaryReader #196

Merged
merged 2 commits into from
Jun 4, 2024
Merged

Conversation

jt0
Copy link
Contributor

@jt0 jt0 commented Jun 3, 2024

Reset improves reader performance when the same binary data is processed multiple times. It works by reusing the symbol table constructed during the first read, and jumping straight to the first "content" byte when Reset is called.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Reset improves reader performance when the same binary data is processed
multiple times. It works by reusing the symbol table constructed during
the first read, and jumping straight to the first "content" byte when
Reset is called.
// reuse a binaryReader with inconsistent input bytes will cause the reader
// to return errors, misappropriate values to unrelated or non-existent
// attributes, or incorrectly parse data values.
func (r *binaryReader) Reset(in []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, can you add a note about the visibility of this function for those of us who aren't using Go on a regular basis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is exported, so anyone can call it. It's not included in the Reader interface, which is the return type from the NewReader* functions. In order to use this, one needs to define their own Go interface similar to this:

type resettableReader interface {
	ion.Reader
	Reset([]byte) error
}

and perform a type assertion on the interface in order to use it. For example:

func resetOrNew(reader ion.Reader, content []byte) ion.Reader { 
	if rr, ok := reader.(resettableReader); ok {
		_ = rr.Reset(content)
		return rr
	}
	return ion.NewReaderBytes(content)
}

@popematt
Copy link
Contributor

popematt commented Jun 4, 2024

@nirosys, Can you take a look at this PR?

Copy link

@nirosys nirosys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey jt0! Thanks for the PR.

Could you maybe provide some more information about your use case, and what prompted you to look into this optimization?

I have some concerns about the approach, but I might be missing something.

Comment on lines +40 to +53
// Reset causes the binary reader to start reading from the given input bytes
// while skipping most of the initialization steps needed to prepare the
// reader. Reset is most commonly called with the same bytes as the reader
// was originally created with (e.g. via NewReaderBytes) as an optimization
// when the same data needs to be read multiple times.
//
// While it is possible to call Reset with different input bytes, the Reader
// will only work correctly if the new bytes contain the exact same binary
// version marker and local symbols as the original input. If there are any
// doubts whether this is the case, it is instead recommended to create a
// new Reader using NewReaderBytes (or NewReaderCat) instead. Attempting to
// reuse a binaryReader with inconsistent input bytes will cause the reader
// to return errors, misappropriate values to unrelated or non-existent
// attributes, or incorrectly parse data values.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about this implementation. In ion-c we have APIs to reset the underlying stream of a reader, which includes resetting the catalog and symbol table to default values. This Reset would only swap the underlying bytes which feels, to me, less than a reset.

That it doesn't work "correctly" unless you provide bytes that have the exact same leading bytes also concerns me. I feel like the method would be very error prone, and I question the re-usability of it given that an Ion reader can be created with anything that implements io.Read, but this functionality is limited to data already read into memory.

The optimization seems very application specific, which leads me to think that if the application had the right information (catalog, and reader offset), it could re-create the reader itself. Maybe that would be a better approach? By providing access to the right data, the application could optimize re-construction of the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ion-c we have APIs to reset the underlying stream of a reader, which includes resetting the catalog and symbol table to default values.

In ion-go, you achieve this semantic by calling ion.NewReaderBytes or ion.NewReader with an io.Reader that has it's own reset capability.

That it doesn't work "correctly" unless you provide bytes that have the exact same leading bytes also concerns me. I feel like the method would be very error prone

I tried to be clear that one has to be careful to only provide consistent bytes. Tbh, I don't expect that would be a common use case. Happy to update the comment if it needs to be more explicit.

which leads me to think that if the application had the right information (catalog, and reader offset), it could re-create the reader itself.

Not as the library is currently written, since neither binaryReader's nor its attributes are exported, and there are no "Set" functions provided. Even if those were added, I'm not sure it would be any better than having a function like Reset that manages the details for you.

nirosys
nirosys previously approved these changes Jun 4, 2024
Copy link

@nirosys nirosys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion, I'm good with merging the change, with the understanding that we'll revisit the use case at a later date.

@popematt popematt merged commit 1c5a552 into amazon-ion:master Jun 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants