-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize to reduce a perf regression with go-merkledag #21
Conversation
std's errors has had wrapping for years now.
IPLD's codec helper reader has a relatively high cost, unfortunately. It was the main contributor to a slowdown in go-merkledag when moving from the old protobuf gogo-generated decoder to go-codec-dagpb. Using a []byte also means we can reuse protobuf's well-optimized "wire encoding" helpers, which gets us extra speed and allows removing some code. This should not matter in practice for the time being, as the only go-codec-dagpb user is go-merkledag and it uses bytes.Buffer everywhere. In the future it would be nice for go-codec-dagpb to be just as efficient with a stream decoder, but right now I don't have the extra week to get into that. Plus, if the core protobuf implementation works on []byte, one can assume it's reasonable for us to do the same. Using the new BenchmarkRoundtrip in go-merkledag with go-codec-dagpb, we get a significant uplift in performance: name old time/op new time/op delta Roundtrip-8 6.49µs ± 1% 5.34µs ± 1% -17.74% (p=0.002 n=6+6) name old alloc/op new alloc/op delta Roundtrip-8 8.07kB ± 0% 7.50kB ± 0% -7.04% (p=0.002 n=6+6) name old allocs/op new allocs/op delta Roundtrip-8 171 ± 0% 148 ± 0% -13.45% (p=0.002 n=6+6)
Like the previous commit, this helps reduce allocations as well as improve performance thanks to the well-optimized protowire package. And, as before, we get to remove unnecessary code. name old time/op new time/op delta Roundtrip-8 4.81µs ± 0% 4.35µs ± 1% -9.59% (p=0.004 n=5+6) name old alloc/op new alloc/op delta Roundtrip-8 7.14kB ± 0% 6.86kB ± 0% -3.83% (p=0.000 n=6+5) name old allocs/op new allocs/op delta Roundtrip-8 119 ± 0% 112 ± 0% -5.88% (p=0.002 n=6+6)
name old time/op new time/op delta Roundtrip-8 4.35µs ± 1% 4.27µs ± 0% -1.92% (p=0.004 n=6+5) name old alloc/op new alloc/op delta Roundtrip-8 6.86kB ± 0% 6.86kB ± 0% +0.01% (p=0.004 n=5+6) name old allocs/op new allocs/op delta Roundtrip-8 112 ± 0% 106 ± 0% -5.36% (p=0.002 n=6+6)
An io.Writer, by definition, will always copy bytes. That's fine in general, and can't be worked around without breaking the writer's contract. However, the main use of go-codec-dagpb today is go-merkledag, which simply uses the codec to encode a node into a buffer. So, we had to create a new bytes.Buffer, write to it, and grab its bytes. This is one extra allocation (the bytes.Buffer object itself), plus copying the encoded bytes an extra time, since we must copy from Encode's internal buffer to the bytes.Buffer. Add a lower-level append-like AppendEncode that cuts the middle man, removing both of those extra pieces of work. For the sake of consistency, we add DecodeBytes to mirror the above on the decode side. Decode already had a shortcut for Bytes, but this way it's more evident what we're doing, and we also avoid allocating a bytes.Buffer just to call Bytes on it. Using these new APIs in go-merkledag shows nice results: name old time/op new time/op delta Roundtrip-8 4.27µs ± 0% 4.07µs ± 0% -4.50% (p=0.004 n=5+6) name old alloc/op new alloc/op delta Roundtrip-8 6.86kB ± 0% 6.38kB ± 0% -6.99% (p=0.002 n=6+6) name old allocs/op new allocs/op delta Roundtrip-8 106 ± 0% 103 ± 0% -2.83% (p=0.002 n=6+6) While at it, we formally deprecate Marshal and Unmarshal, since we're starting to have lots of redundant API surface.
Also FYI @warpfork, in case you were wondering where the codec's slowness was coming from. I think having generic codec interfaces be Reader/Writer is fine, but it should also be fine for lower-level shortcuts like these for use cases like go-merkledag. |
Also, I realise this pulls in protowire and the google.golang.org/protobuf module, but I think that should be fine:
|
Also, to give a high-level view, this is the 2x slow-down between go-merkledag master and our first go-codec-dagpb implementation:
With the help of these patches, we're down to a 1.25x slow-down:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM by me :)
We're switching go-merkledag from its ancient protoc-gen-gogo encoder-decoder to this codec.
The good news is that it works well :) The bad news is that the new branch is about twice as slow when encoding and decoding a ProtoNode, what we can call a "roundtrip".
It's fair to say that part of the blame is to be put on go-merkledag itself, as it has to translate between ProtoNode and PBNode. Still, a good chunk of the slowness was coming from the codec itself.
This pull request includes all the optimizations made to the codec itself. The benchmark will live in go-merkledag once we can merge to master in a week or two. There are gaps between the numbers here because those gaps were optimizations in go-merkledag directly.
Once this is merged, I can push the optimizations to go-merkledag's ipld-in-ipfs branch. I need to do that in order, because otherwise I'd have the go-merkledag branch depend on a non-master branch of dagpb.
Please see the commit messages, which contain benchmark numbers and deatils. I care more about those than I do about this PR body message, so please don't squash-merge :)