optimize to reduce a perf regression with go-merkledag #21

mvdan · 2021-03-28T16:19:33Z

We're switching go-merkledag from its ancient protoc-gen-gogo encoder-decoder to this codec.

The good news is that it works well :) The bad news is that the new branch is about twice as slow when encoding and decoding a ProtoNode, what we can call a "roundtrip".

It's fair to say that part of the blame is to be put on go-merkledag itself, as it has to translate between ProtoNode and PBNode. Still, a good chunk of the slowness was coming from the codec itself.

This pull request includes all the optimizations made to the codec itself. The benchmark will live in go-merkledag once we can merge to master in a week or two. There are gaps between the numbers here because those gaps were optimizations in go-merkledag directly.

Once this is merged, I can push the optimizations to go-merkledag's ipld-in-ipfs branch. I need to do that in order, because otherwise I'd have the go-merkledag branch depend on a non-master branch of dagpb.

Please see the commit messages, which contain benchmark numbers and deatils. I care more about those than I do about this PR body message, so please don't squash-merge :)

std's errors has had wrapping for years now.

IPLD's codec helper reader has a relatively high cost, unfortunately. It was the main contributor to a slowdown in go-merkledag when moving from the old protobuf gogo-generated decoder to go-codec-dagpb. Using a []byte also means we can reuse protobuf's well-optimized "wire encoding" helpers, which gets us extra speed and allows removing some code. This should not matter in practice for the time being, as the only go-codec-dagpb user is go-merkledag and it uses bytes.Buffer everywhere. In the future it would be nice for go-codec-dagpb to be just as efficient with a stream decoder, but right now I don't have the extra week to get into that. Plus, if the core protobuf implementation works on []byte, one can assume it's reasonable for us to do the same. Using the new BenchmarkRoundtrip in go-merkledag with go-codec-dagpb, we get a significant uplift in performance: name old time/op new time/op delta Roundtrip-8 6.49µs ± 1% 5.34µs ± 1% -17.74% (p=0.002 n=6+6) name old alloc/op new alloc/op delta Roundtrip-8 8.07kB ± 0% 7.50kB ± 0% -7.04% (p=0.002 n=6+6) name old allocs/op new allocs/op delta Roundtrip-8 171 ± 0% 148 ± 0% -13.45% (p=0.002 n=6+6)

Like the previous commit, this helps reduce allocations as well as improve performance thanks to the well-optimized protowire package. And, as before, we get to remove unnecessary code. name old time/op new time/op delta Roundtrip-8 4.81µs ± 0% 4.35µs ± 1% -9.59% (p=0.004 n=5+6) name old alloc/op new alloc/op delta Roundtrip-8 7.14kB ± 0% 6.86kB ± 0% -3.83% (p=0.000 n=6+5) name old allocs/op new allocs/op delta Roundtrip-8 119 ± 0% 112 ± 0% -5.88% (p=0.002 n=6+6)

name old time/op new time/op delta Roundtrip-8 4.35µs ± 1% 4.27µs ± 0% -1.92% (p=0.004 n=6+5) name old alloc/op new alloc/op delta Roundtrip-8 6.86kB ± 0% 6.86kB ± 0% +0.01% (p=0.004 n=5+6) name old allocs/op new allocs/op delta Roundtrip-8 112 ± 0% 106 ± 0% -5.36% (p=0.002 n=6+6)

An io.Writer, by definition, will always copy bytes. That's fine in general, and can't be worked around without breaking the writer's contract. However, the main use of go-codec-dagpb today is go-merkledag, which simply uses the codec to encode a node into a buffer. So, we had to create a new bytes.Buffer, write to it, and grab its bytes. This is one extra allocation (the bytes.Buffer object itself), plus copying the encoded bytes an extra time, since we must copy from Encode's internal buffer to the bytes.Buffer. Add a lower-level append-like AppendEncode that cuts the middle man, removing both of those extra pieces of work. For the sake of consistency, we add DecodeBytes to mirror the above on the decode side. Decode already had a shortcut for Bytes, but this way it's more evident what we're doing, and we also avoid allocating a bytes.Buffer just to call Bytes on it. Using these new APIs in go-merkledag shows nice results: name old time/op new time/op delta Roundtrip-8 4.27µs ± 0% 4.07µs ± 0% -4.50% (p=0.004 n=5+6) name old alloc/op new alloc/op delta Roundtrip-8 6.86kB ± 0% 6.38kB ± 0% -6.99% (p=0.002 n=6+6) name old allocs/op new allocs/op delta Roundtrip-8 106 ± 0% 103 ± 0% -2.83% (p=0.002 n=6+6) While at it, we formally deprecate Marshal and Unmarshal, since we're starting to have lots of redundant API surface.

mvdan · 2021-03-28T16:21:16Z

Also FYI @warpfork, in case you were wondering where the codec's slowness was coming from.

I think having generic codec interfaces be Reader/Writer is fine, but it should also be fine for lower-level shortcuts like these for use cases like go-merkledag.

mvdan · 2021-03-28T16:23:09Z

Also, I realise this pulls in protowire and the google.golang.org/protobuf module, but I think that should be fine:

The protowire package itself has few deps, and is pretty small
It would take us at least a week of effort to write protowire-like funcs that are so well optimized and tested
It's reasonable for a protobuf codec to... require the protobuf module :)
It's likely that any program using dagpb will end up requiring the protobuf module anyway

mvdan · 2021-03-28T16:30:06Z

Also, to give a high-level view, this is the 2x slow-down between go-merkledag master and our first go-codec-dagpb implementation:

name         old time/op    new time/op    delta
Roundtrip-8    3.24µs ± 3%    6.49µs ± 1%  +100.40%  (p=0.002 n=6+6)

name         old alloc/op   new alloc/op   delta
Roundtrip-8    4.94kB ± 0%    8.07kB ± 0%   +63.27%  (p=0.002 n=6+6)

name         old allocs/op  new allocs/op  delta
Roundtrip-8       109 ± 0%       171 ± 0%   +56.88%  (p=0.002 n=6+6)

With the help of these patches, we're down to a 1.25x slow-down:

name         old time/op    new time/op    delta
Roundtrip-8    3.24µs ± 3%    4.07µs ± 0%  +25.72%  (p=0.002 n=6+6)

name         old alloc/op   new alloc/op   delta
Roundtrip-8    4.94kB ± 0%    6.38kB ± 0%  +29.13%  (p=0.002 n=6+6)

name         old allocs/op  new allocs/op  delta
Roundtrip-8       109 ± 0%       103 ± 0%   -5.50%  (p=0.002 n=6+6)

hannahhoward

LGTM by me :)

mvdan added 5 commits March 27, 2021 16:18

remove unnecessary xerrors dep

ee09902

std's errors has had wrapping for years now.

mvdan requested review from rvagg, acruikshank, hannahhoward and willscott March 28, 2021 16:20

mvdan mentioned this pull request Mar 28, 2021

use go-ipld-prime scratch.Reader #1

Closed

hannahhoward approved these changes Mar 29, 2021

View reviewed changes

rvagg approved these changes Mar 30, 2021

View reviewed changes

mvdan merged commit 8ec6b0f into ipld:master Mar 30, 2021

mvdan deleted the feat/merkledag-perf branch April 20, 2021 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize to reduce a perf regression with go-merkledag #21

optimize to reduce a perf regression with go-merkledag #21

mvdan commented Mar 28, 2021

mvdan commented Mar 28, 2021

mvdan commented Mar 28, 2021

mvdan commented Mar 28, 2021

hannahhoward left a comment

optimize to reduce a perf regression with go-merkledag #21

optimize to reduce a perf regression with go-merkledag #21

Conversation

mvdan commented Mar 28, 2021

mvdan commented Mar 28, 2021

mvdan commented Mar 28, 2021

mvdan commented Mar 28, 2021

hannahhoward left a comment

Choose a reason for hiding this comment