Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: disable XXX_NoUnkeyedLiteral, XXX_unrecognized, and XXX_sizecache fields in protos #12790

Merged
merged 2 commits into from
Mar 27, 2021

Conversation

nvanbenschoten
Copy link
Contributor

This commit removes the XXX_NoUnkeyedLiteral, XXX_unrecognized, and XXX_sizecache auto-generated fields from generated protobuf structs in the raft package. This was done for all of the same reasons CockroachDB removed the generation of these fields in cockroachdb/cockroach#38404. They come with very limited advantages but moderate disadvantages.

XXX_NoUnkeyedLiteral and XXX_sizecache were only enabled recently in cc7b4fa, and this appears to have been unintentional. Meanwhile, XXX_unrecognized has been around for longer and has arguably more reason to stay because it can assist with forwards compatibility. However, any real mixed-version upgrade story for this package is mostly untold at this point, and keeping this field seems just as likely to cause unexpected bugs (e.g. a field was propagated but not updated correctly when passed through an old version) as it seems to fix real issues, so it also doesn't warrant its cost.

This reduces the in-memory representation size of all Raft protos. Notably, it reduces the memory size of an Entry proto from 80 bytes to 48 bytes and the memory size of a Message proto from 392 bytes to 264 bytes. Both of these structs are used frequently, and often in slices, where this wasted space really starts to add up.

This was motivated by a regression in microbenchmarks in CockroachDB due to cc7b4fa, which was caught in cockroachdb/cockroach#62212.

@nvanbenschoten
Copy link
Contributor Author

@tbg mind taking a look at this when you get a chance? I'm not able to assign a reviewer to this.

This test checks the in-memory size of each proto struct to detect
unexpected changes to their memory representation.
…he fields in protos

This commit removes the `XXX_NoUnkeyedLiteral`, `XXX_unrecognized`, and
`XXX_sizecache` auto-generated fields from generated protobuf structs in
the raft package. This was done for all of the same reasons CockroachDB
removed the generation of these fields in cockroachdb/cockroach#38404.
They come with very limited advantages but moderate disadvantages.

`XXX_NoUnkeyedLiteral` and `XXX_sizecache` were only enabled recently in
cc7b4fa, and this appears to have been unintentional. Meanwhile,
`XXX_unrecognized` has been around for longer and has arguably more
reason to stay because it can assist with forwards compatibility.
However, any real mixed-version upgrade story for this package is mostly
untold at this point, and keeping this field seems just as likely to
cause unexpected bugs (e.g. a field was propagated but not updated
correctly when passed through an old version) as it seems to fix real
issues, so it also doesn't warrant its cost.

This reduces the in-memory representation size of all Raft protos.
Notably, it reduces the memory size of an `Entry` proto from *80 bytes*
to *48 bytes* and the memory size of a `Message` proto from *392 bytes*
to *264 bytes*. Both of these structs are used frequently, and often in
slices, where this wasted space really starts to add up.

This was motivated by a regression in microbenchmarks in CockroachDB due
to cc7b4fa, which was caught in cockroachdb/cockroach#62212.
@codecov-io
Copy link

Codecov Report

Merging #12790 (e51c697) into master (8469108) will decrease coverage by 4.57%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12790      +/-   ##
==========================================
- Coverage   52.83%   48.25%   -4.58%     
==========================================
  Files         383      383              
  Lines       31347    31346       -1     
==========================================
- Hits        16561    15127    -1434     
- Misses      12771    14297    +1526     
+ Partials     2015     1922      -93     
Flag Coverage Δ
all 48.25% <ø> (-4.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
raft/raftpb/confstate.go 75.00% <ø> (-1.93%) ⬇️
client/v2/cancelreq.go 0.00% <0.00%> (-100.00%) ⬇️
client/v2/keys.go 0.00% <0.00%> (-88.07%) ⬇️
client/v2/members.go 0.00% <0.00%> (-84.91%) ⬇️
server/etcdserver/api/v2v3/watcher.go 0.00% <0.00%> (-82.09%) ⬇️
client/v2/client.go 0.00% <0.00%> (-80.96%) ⬇️
server/etcdserver/api/v2v3/store.go 0.00% <0.00%> (-78.37%) ⬇️
client/v3/utils.go 0.00% <0.00%> (-75.00%) ⬇️
pkg/testutil/leak.go 0.00% <0.00%> (-72.50%) ⬇️
client/v3/internal/resolver/resolver.go 0.00% <0.00%> (-70.00%) ⬇️
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8469108...e51c697. Read the comment docs.

@ptabor
Copy link
Contributor

ptabor commented Mar 20, 2021

Really nice regressions finding infrastucture.

  1. It's interesting that its declared as regression, as this file: https://github.com/etcd-io/etcd/blob/release-3.4/raft/raftpb/raft.proto had never the settings you are adding. Maybe gogo-proto added support for generating of this fields in newer versions together with options to disable them.

  2. Taking in consideration the fact that gogo-proto is unmaintained, and blocking cooperation with protobuf-1.4.x interfaces/runtime, I don't think we should depend on any additional gogo-specific features.

@dsnet: Joe, could you, please, asses risk-factor of using this flags for potential gogo -> golang/protobuf generator migration
or 'runtime' compatibility challenges ?

option (gogoproto.goproto_unkeyed_all) = false;
option (gogoproto.goproto_unrecognized_all) = false;   -> this one is already in k8s. 
option (gogoproto.goproto_sizecache_all) = false;

@dsnet
Copy link

dsnet commented Mar 20, 2021

option (gogoproto.goproto_unkeyed_all) = false;

Has no effect on the runtime, but I don't recommend setting it. There is no memory benefit to removing this field, and causes compatibility issues at scale when users declare unkeyed struct literals of your messages and you unexpectedly break them when you add a new message field.

option (gogoproto.goproto_unrecognized_all) = false; -> this one is already in k8s.

As you may be aware, the runtime handles this gracefully. Calling GetUnknown in the protobuf reflection APi will return nothing (as expected). Calling SetUnknown will cause the unknown bytes to be silently discarded.

option (gogoproto.goproto_sizecache_all) = false;

If the type has a generated Marshal method, then this is probably okay to drop, otherwise it will cause the runtime to have quadratic behavior when marshaling. The reason for this is because the protobuf wire format uses a length-prefix for each sub-message, so you need to know the sub-message size before you can start marshaling the message itself. The main protobuf module uses front-to-back serialization (which has other benefits), and so relies on the size cache to make this efficient. I believe the generated Marshal methods from gogo/protobuf uses back-to-front serialization and so avoids the quadratic behavior problem.

@nvanbenschoten
Copy link
Contributor Author

Thank you two for taking a look, especially over the weekend!

It's interesting that its declared as regression, as this file: https://github.com/etcd-io/etcd/blob/release-3.4/raft/raftpb/raft.proto had never the settings you are adding. Maybe gogo-proto added support for generating of this fields in newer versions together with options to disable them.

You raise a good point. It got picked up as a regression in our test suite because XXX_NoUnkeyedLiteral and XXX_sizecache were only recently enabled in cc7b4fa. This appears to be related to the protobuf version upgrade in #12397. This bloated the in-memory struct size, making a memory-limited cache in CRDB less efficient because the change reduced the cache's effective capacity by 11%.

But the testing added in this PR demonstrates that some messages were hit even worse. Those that contained nested messages grew even more. For instance, Message is about 50% larger than it needs to be, with 128 of its 392 bytes allocated to these three fields. As you know, slices of Entry and Message structs are the workhorses of this raft library, so keeping their size down is quite important.

Has no effect on the runtime, but I don't recommend setting it. There is no memory benefit to removing this field, and causes compatibility issues at scale when users declare unkeyed struct literals of your messages and you unexpectedly break them when you add a new message field.

Right, this is true if the field is not at the end of generated protos. But if it is the last field in the generated proto, the empty struct does increase the size of the parent struct due to golang/go#9401. I sympathize with everything you say here and generally like the idea of the field, but if it has a runtime cost as it would here, was only recently introduced a few months ago, and can be replaced by the use of go vet's “composites” check, I don't think it's worth keeping.

I believe the generated Marshal methods from gogo/protobuf uses back-to-front serialization and so avoids the quadratic behavior problem.

This is what we found in cockroachdb/cockroach@1967c25 as well. With gogo/protobuf's generated Marshal method, the field was never used. So it was strictly wasted space.

@ptabor
Copy link
Contributor

ptabor commented Mar 22, 2021

Thank you, Joe & Nathan,

As you know, slices of Entry and Message structs are the workhorses of this raft library, so keeping their size down is quite important.

To be honest, I don't understand why Raft is processing everywhere slices of Entry(s) & Message(s) instead of pointers to them. I would assume there must be a lot of copying going on that could be avoided. Is the 'cache' efficiency reason here ?

@ptabor ptabor merged commit 578ffd5 into etcd-io:master Mar 27, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 30, 2021
This commit picks up etcd-io/etcd#12790, which
resolves a regression observed in cockroachdb#62212.

Picks up the following four commits:
- etcd-io/etcd@87258ef
- etcd-io/etcd@2c2456b
- etcd-io/etcd@ebc0174
- etcd-io/etcd@e51c697

```
name                  old time/op    new time/op    delta
EntryCache-16            114µs ± 7%      96µs ±12%  -15.29%  (p=0.000 n=19+19)
EntryCacheClearTo-16    37.7µs ± 4%    34.9µs ± 4%   -7.55%  (p=0.000 n=17+18)

name                  old alloc/op   new alloc/op   delta
EntryCacheClearTo-16    1.28kB ± 0%    0.77kB ± 0%  -40.00%  (p=0.000 n=20+20)
EntryCache-16           83.3kB ± 0%    50.0kB ± 0%  -39.95%  (p=0.000 n=19+20)

name                  old allocs/op  new allocs/op  delta
EntryCache-16             3.00 ± 0%      3.00 ± 0%     ~     (all equal)
EntryCacheClearTo-16      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Mar 31, 2021
62819: go.mod: bump etcd/raft to pick up proto size improvements r=tbg a=nvanbenschoten

This commit picks up etcd-io/etcd#12790, which resolves a regression observed in #62212.

Picks up the following four commits, each of which I've reviewed for safety:
- etcd-io/etcd@87258ef
- etcd-io/etcd@2c2456b
- etcd-io/etcd@ebc0174
- etcd-io/etcd@e51c697

```
name                  old time/op    new time/op    delta
EntryCache-16            114µs ± 7%      96µs ±12%  -15.29%  (p=0.000 n=19+19)
EntryCacheClearTo-16    37.7µs ± 4%    34.9µs ± 4%   -7.55%  (p=0.000 n=17+18)

name                  old alloc/op   new alloc/op   delta
EntryCacheClearTo-16    1.28kB ± 0%    0.77kB ± 0%  -40.00%  (p=0.000 n=20+20)
EntryCache-16           83.3kB ± 0%    50.0kB ± 0%  -39.95%  (p=0.000 n=19+20)

name                  old allocs/op  new allocs/op  delta
EntryCache-16             3.00 ± 0%      3.00 ± 0%     ~     (all equal)
EntryCacheClearTo-16      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Apr 9, 2021
This commit picks up etcd-io/etcd#12790, which
resolves a regression observed in cockroachdb#62212.

Picks up the following four commits:
- etcd-io/etcd@87258ef
- etcd-io/etcd@2c2456b
- etcd-io/etcd@ebc0174
- etcd-io/etcd@e51c697

```
name                  old time/op    new time/op    delta
EntryCache-16            114µs ± 7%      96µs ±12%  -15.29%  (p=0.000 n=19+19)
EntryCacheClearTo-16    37.7µs ± 4%    34.9µs ± 4%   -7.55%  (p=0.000 n=17+18)

name                  old alloc/op   new alloc/op   delta
EntryCacheClearTo-16    1.28kB ± 0%    0.77kB ± 0%  -40.00%  (p=0.000 n=20+20)
EntryCache-16           83.3kB ± 0%    50.0kB ± 0%  -39.95%  (p=0.000 n=19+20)

name                  old allocs/op  new allocs/op  delta
EntryCache-16             3.00 ± 0%      3.00 ± 0%     ~     (all equal)
EntryCacheClearTo-16      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
```
@tbg tbg mentioned this pull request Oct 12, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants