Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize to reduce a perf regression with go-merkledag #21

Merged
merged 5 commits into from
Mar 30, 2021
Merged

optimize to reduce a perf regression with go-merkledag #21

merged 5 commits into from
Mar 30, 2021

Conversation

mvdan
Copy link
Contributor

@mvdan mvdan commented Mar 28, 2021

We're switching go-merkledag from its ancient protoc-gen-gogo encoder-decoder to this codec.

The good news is that it works well :) The bad news is that the new branch is about twice as slow when encoding and decoding a ProtoNode, what we can call a "roundtrip".

It's fair to say that part of the blame is to be put on go-merkledag itself, as it has to translate between ProtoNode and PBNode. Still, a good chunk of the slowness was coming from the codec itself.

This pull request includes all the optimizations made to the codec itself. The benchmark will live in go-merkledag once we can merge to master in a week or two. There are gaps between the numbers here because those gaps were optimizations in go-merkledag directly.

Once this is merged, I can push the optimizations to go-merkledag's ipld-in-ipfs branch. I need to do that in order, because otherwise I'd have the go-merkledag branch depend on a non-master branch of dagpb.

Please see the commit messages, which contain benchmark numbers and deatils. I care more about those than I do about this PR body message, so please don't squash-merge :)

std's errors has had wrapping for years now.
IPLD's codec helper reader has a relatively high cost, unfortunately. It
was the main contributor to a slowdown in go-merkledag when moving from
the old protobuf gogo-generated decoder to go-codec-dagpb.

Using a []byte also means we can reuse protobuf's well-optimized "wire
encoding" helpers, which gets us extra speed and allows removing some
code.

This should not matter in practice for the time being, as the only
go-codec-dagpb user is go-merkledag and it uses bytes.Buffer everywhere.

In the future it would be nice for go-codec-dagpb to be just as
efficient with a stream decoder, but right now I don't have the extra
week to get into that. Plus, if the core protobuf implementation works
on []byte, one can assume it's reasonable for us to do the same.

Using the new BenchmarkRoundtrip in go-merkledag with go-codec-dagpb, we
get a significant uplift in performance:

	name         old time/op    new time/op    delta
	Roundtrip-8    6.49µs ± 1%    5.34µs ± 1%  -17.74%  (p=0.002 n=6+6)

	name         old alloc/op   new alloc/op   delta
	Roundtrip-8    8.07kB ± 0%    7.50kB ± 0%   -7.04%  (p=0.002 n=6+6)

	name         old allocs/op  new allocs/op  delta
	Roundtrip-8       171 ± 0%       148 ± 0%  -13.45%  (p=0.002 n=6+6)
Like the previous commit, this helps reduce allocations as well as
improve performance thanks to the well-optimized protowire package.

And, as before, we get to remove unnecessary code.

	name         old time/op    new time/op    delta
	Roundtrip-8    4.81µs ± 0%    4.35µs ± 1%  -9.59%  (p=0.004 n=5+6)

	name         old alloc/op   new alloc/op   delta
	Roundtrip-8    7.14kB ± 0%    6.86kB ± 0%  -3.83%  (p=0.000 n=6+5)

	name         old allocs/op  new allocs/op  delta
	Roundtrip-8       119 ± 0%       112 ± 0%  -5.88%  (p=0.002 n=6+6)
	name         old time/op    new time/op    delta
	Roundtrip-8    4.35µs ± 1%    4.27µs ± 0%  -1.92%  (p=0.004 n=6+5)

	name         old alloc/op   new alloc/op   delta
	Roundtrip-8    6.86kB ± 0%    6.86kB ± 0%  +0.01%  (p=0.004 n=5+6)

	name         old allocs/op  new allocs/op  delta
	Roundtrip-8       112 ± 0%       106 ± 0%  -5.36%  (p=0.002 n=6+6)
An io.Writer, by definition, will always copy bytes. That's fine in
general, and can't be worked around without breaking the writer's
contract.

However, the main use of go-codec-dagpb today is go-merkledag, which
simply uses the codec to encode a node into a buffer.

So, we had to create a new bytes.Buffer, write to it, and grab its
bytes. This is one extra allocation (the bytes.Buffer object itself),
plus copying the encoded bytes an extra time, since we must copy from
Encode's internal buffer to the bytes.Buffer.

Add a lower-level append-like AppendEncode that cuts the middle man,
removing both of those extra pieces of work.

For the sake of consistency, we add DecodeBytes to mirror the above on
the decode side. Decode already had a shortcut for Bytes, but this way
it's more evident what we're doing, and we also avoid allocating a
bytes.Buffer just to call Bytes on it.

Using these new APIs in go-merkledag shows nice results:

	name         old time/op    new time/op    delta
	Roundtrip-8    4.27µs ± 0%    4.07µs ± 0%  -4.50%  (p=0.004 n=5+6)

	name         old alloc/op   new alloc/op   delta
	Roundtrip-8    6.86kB ± 0%    6.38kB ± 0%  -6.99%  (p=0.002 n=6+6)

	name         old allocs/op  new allocs/op  delta
	Roundtrip-8       106 ± 0%       103 ± 0%  -2.83%  (p=0.002 n=6+6)

While at it, we formally deprecate Marshal and Unmarshal, since we're
starting to have lots of redundant API surface.
@mvdan
Copy link
Contributor Author

mvdan commented Mar 28, 2021

Also FYI @warpfork, in case you were wondering where the codec's slowness was coming from.

I think having generic codec interfaces be Reader/Writer is fine, but it should also be fine for lower-level shortcuts like these for use cases like go-merkledag.

@mvdan
Copy link
Contributor Author

mvdan commented Mar 28, 2021

Also, I realise this pulls in protowire and the google.golang.org/protobuf module, but I think that should be fine:

  • The protowire package itself has few deps, and is pretty small
  • It would take us at least a week of effort to write protowire-like funcs that are so well optimized and tested
  • It's reasonable for a protobuf codec to... require the protobuf module :)
  • It's likely that any program using dagpb will end up requiring the protobuf module anyway

@mvdan
Copy link
Contributor Author

mvdan commented Mar 28, 2021

Also, to give a high-level view, this is the 2x slow-down between go-merkledag master and our first go-codec-dagpb implementation:

name         old time/op    new time/op    delta
Roundtrip-8    3.24µs ± 3%    6.49µs ± 1%  +100.40%  (p=0.002 n=6+6)

name         old alloc/op   new alloc/op   delta
Roundtrip-8    4.94kB ± 0%    8.07kB ± 0%   +63.27%  (p=0.002 n=6+6)

name         old allocs/op  new allocs/op  delta
Roundtrip-8       109 ± 0%       171 ± 0%   +56.88%  (p=0.002 n=6+6)

With the help of these patches, we're down to a 1.25x slow-down:

name         old time/op    new time/op    delta
Roundtrip-8    3.24µs ± 3%    4.07µs ± 0%  +25.72%  (p=0.002 n=6+6)

name         old alloc/op   new alloc/op   delta
Roundtrip-8    4.94kB ± 0%    6.38kB ± 0%  +29.13%  (p=0.002 n=6+6)

name         old allocs/op  new allocs/op  delta
Roundtrip-8       109 ± 0%       103 ± 0%   -5.50%  (p=0.002 n=6+6)

Copy link

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM by me :)

@mvdan mvdan merged commit 8ec6b0f into ipld:master Mar 30, 2021
@mvdan mvdan deleted the feat/merkledag-perf branch April 20, 2021 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants