Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fixed64 and sfixed64 for the sums and counts #214

Merged
merged 1 commit into from
Aug 29, 2020

Conversation

bogdandrutu
Copy link
Member

Fixes #211

@bogdandrutu bogdandrutu requested review from a team August 26, 2020 23:24
@jmacd
Copy link
Contributor

jmacd commented Aug 26, 2020

I feel like #11 didn't provide justification for this. It's smaller on the wire to and slower to decode, and #211 says that therefore we should make the trade?

@bogdandrutu
Copy link
Member Author

@jmacd I think you wanted to reference #211 not #11. Happy to drop this, but in my opinion it is a small win that we get on the CPU by trading a smaller network traffic.

Not sure what to do, but feels like more natural to encode the numbers this way, this will speedup also functions like "size()".

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a tiny bit performance improvement. Works either way, happy to LGTM if you prefer this approach.

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
@anuraaga
Copy link

If I'm not mistaken, pretty large values like 34 billion can be stored in 5 bytes with varint, vs 8 bytes with fixed which is about 30% savings.

https://github.com/protocolbuffers/protobuf/blob/367f3d831ee3186035a14f871cbcbc1b028b6b9a/python/google/protobuf/internal/encoder.py#L103

It seems relatively rare to cross such a huge number, and very common to be even smaller. Won't the I/O time vastly outweigh the varint processing in practical cases?

this will speedup also functions like "size()".

Correct me if I'm wrong, but I don't think there are any proto implementations that read directly from the serialized state. Since they copy into an arraylike, size() shouldn't be affected.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly ambivalent here. I'm not sure the performance improvements warrant the change, but I don't think it's worth blocking this if there is strong desire for this change. From a usability standpoint it seems neutral to me.

@bogdandrutu
Copy link
Member Author

bogdandrutu commented Aug 29, 2020

@anuraaga not sure, for 100 data points sums/gauges (timeseries) the extra bytes is at most 700, which is less than the 1/2 MTU, so it is 50% chance no more messages, and 50% chance one extra message. Not sure how to prove this, and happy to drop it, because you seem to have a more strong opinion.

Correct me if I'm wrong, but I don't think there are any proto implementations that read directly from the serialized state. Since they copy into an arraylike, size() shouldn't be affected.

size() is to calculate the size of the byte array, so in golang for example you want to pre-allocate the byte array, so want to calculate the size of the serialized message, and in case of fixed primitives return directly 8, in case of varint few if conditions (usually 2-3 on average).

@bogdandrutu bogdandrutu reopened this Aug 29, 2020
Copy link

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for clarifying size is about the serialized size, yeah that does seem like it can get significant. And found that capn proto uses fixed size too so this seems to make sense :) We're in a different world in terms of network / CPU size than when proto was first developed I guess.

https://stackoverflow.com/a/24642169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check usage of int64 vs sfixed64 in metrics proto and uint64 vs fixed64
5 participants