Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EncodedSize trait to calculate encoded sizes #609

Closed
wants to merge 1 commit into from

Conversation

markbt
Copy link

@markbt markbt commented Jan 24, 2023

Related issue: #539

Hi bincode developers. This is the first in a series of 3 PRs that would allow us to replace our current use of Abomonation with bincode, without a noticeable performance hit. We'd like these to be adopted upstream, and hope they are useful for other users of bincode also.

For some storage back-ends it's necessary to know the size of the buffer to pre-allocate before encoding. This is true of our cache back-end, which is C++ code that we communicate with over FFI. The bincode 1.0 version had bincode::serialized_size, which would tell you the serialized size of data, but for bincode 2.0 this is not implemented.

One suggestion is to implement a SizeOnlyWriter which adds up the sizes of the slices it's been given, however this effectively requires serializing the data twice, as it doesn't allow types to efficiently compute the size when it is possible to do so without performing the actual serialization steps.

In this PR, we add a new EncodedSize trait which accomplishes this objective, and implement that for all types. Implementations for custom structs and enums can be derived using #[derive(bincode::EncodedSize)].

For some storage back-ends it's necessary to know the size of the buffer to pre-allocate before encoding.

The bincode 1.0 version had `bincode::serialized_size`, which would tell you the serialized size of data.

We add a new `EncodedSize` trait which accomplishes the same objective, and implement that for all types.  Implementations for custom structs and enums can be derived using `#[derive(bincode::EncodedSize)]`.
@VictorKoenders
Copy link
Contributor

Hi, and thank you for this PR!

Scanning over this PR, I'm not actually sure this will be faster than SizeOnlyWriter. This PR still has to iterate over all the fields recursively and count all the entries.

Is it possible to set up a benchmark to compare the speed of an SizeOnlyWriter implementation, and this PR?

@markbt
Copy link
Author

markbt commented Jan 31, 2023

Benchmarking is a good idea. It ended up being a bit of a journey, and it looks like you're right, as long as LTO is enabled.

To start with, I added a variant of encoded_size to bincode which used SizeOnlyWriter. I do think it would be good to provide this as part of the library, if only as a convenience, to save everyone re-implementing it. I created a benchmark that computes the serialized size a Vec of 10,000 random integers of various sizes.

Results were initially pretty promising for the EncodedSize trait (all times in nanoseconds):

u8 u16 u32 u64
SizeOnlyWriter 14770.00 29581 29416 37906
EncodedSize 1.45 12686 10511 12582

Yes, for Vec<u8> it's 10,000 times faster with EncodedSize.

That seemed pretty obvious to be an optimization problem, with Vec<u8> likely not being optimized to len(x) as it was with the EncodedSize variant of Vec<u8>.

To test this out, I moved the SizeOnlyWriter implementation to the benchmark crate:

u8 u16 u32 u64
Local SizeOnlyWriter 0.206 3953 5741 7257

That's much faster, and even beats out the dedicated trait. I don't think this is a general-purpose solution: I don't think it's reasonable to copy and paste SizeOnlyWriter to every location in your codebase that you want to compute the size. Also, it's likely that it won't scale well, as it'll mean a new monomorphisation of the same code each time. But it gives a good lower bound.

Then I realised that LTO wasn't enabled, and that it might help. It did:

u8 u16 u32 u64
SizeOnlyWriter + LTO 0.206 3864 5671 7218
EncodedSize + LTO 0.206 3876 5707 7268
Local SizeOnlyWriter + LTO 0.206 3877 5676 7242

So ultimately I think using SizeOnlyWriter is probably fine. I do think it would be nice if this was built into bincode as the recommended way of doing this (with the caveat that you shouldn't bother unless you really need to, of course).

Will leave this PR up for a bit in case anyone disagrees, but I now plan to close this PR.

The benchmark is in this commit if anyone wants to try it out themselves.

@VictorKoenders
Copy link
Contributor

I do think it would be nice if this was built into bincode

I feel like added functionality like this this would be better in a different crate. In #578 (comment) I talked about introducing a crate named bincode-contrib or bincode-utils for common utilities that can be build on top of bincode without big changes.

Unfortunately both Zoey and I have been very busy the last couple of months so we have not had time to set this up yet

@markbt markbt closed this Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants