-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use non-pointer receiver for Marshal and Size #155
Comments
This has been tried before, and demonstrated worse performance. See :#72 You shouldn't have to address your locals to call a method; in the expression |
Your point about escape analysis seems right. With pointer receivers if I pass just my struct, it is not. I have to pass a pointer to my struct, which doesn't fit my needs. About performances, I see @zond said
But running the benchmarks in master, performances do not seem to be affected. (I only get some small variance going both ways, which was expected): $ go version
go version go1.5.3 linux/amd64
# master
$ git rev-parse HEAD
cf4d6d402b01d9b359f52fc88be0f582402177c0
$ go install ./...
$ go generate ./...
======== MessagePack Code Generator =======
>>> Input: "defs_test.go"
>>> Wrote and formatted "defgen_test.go"
$ go test -v -cpu=2 ./... -bench .
# [All tests pass]
PASS
BenchmarkLocate-2 20000000 97.8 ns/op 531.83 MB/s 0 B/op 0 allocs/op
BenchmarkReadWriteFloat32-2 20000000 80.0 ns/op
BenchmarkReadWriteFloat64-2 20000000 82.3 ns/op
BenchmarkUnmarshalAsJSON-2 1000000 1698 ns/op 93.59 MB/s 16 B/op 1 allocs/op
BenchmarkCopyToJSON-2 1000000 1989 ns/op 79.92 MB/s 48 B/op 1 allocs/op
BenchmarkStdlibJSON-2 200000 5783 ns/op 29.40 MB/s 920 B/op 36 allocs/op
BenchmarkReadMapHeaderBytes-2 200000000 8.32 ns/op 360.43 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeaderBytes-2 200000000 7.61 ns/op 394.27 MB/s 0 B/op 0 allocs/op
BenchmarkReadNilByte-2 1000000000 2.72 ns/op 367.11 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64Bytes-2 200000000 9.20 ns/op 978.47 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32Bytes-2 300000000 5.27 ns/op 948.08 MB/s 0 B/op 0 allocs/op
BenchmarkReadBoolBytes-2 200000000 6.24 ns/op 160.23 MB/s 0 B/op 0 allocs/op
BenchmarkReadTimeBytes-2 100000000 16.3 ns/op 920.73 MB/s 0 B/op 0 allocs/op
BenchmarkSkipBytes-2 10000000 164 ns/op 908.50 MB/s 0 B/op 0 allocs/op
BenchmarkReadMapHeader-2 100000000 20.9 ns/op 95.71 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeader-2 100000000 20.6 ns/op 96.94 MB/s 0 B/op 0 allocs/op
BenchmarkReadNil-2 100000000 16.9 ns/op 59.09 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64-2 50000000 25.6 ns/op 351.97 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32-2 100000000 22.2 ns/op 225.27 MB/s 0 B/op 0 allocs/op
BenchmarkReadInt64-2 100000000 24.8 ns/op 161.34 MB/s 0 B/op 0 allocs/op
BenchmarkReadUint64-2 50000000 24.8 ns/op 80.63 MB/s 0 B/op 0 allocs/op
BenchmarkRead16Bytes-2 30000000 40.1 ns/op 448.85 MB/s 0 B/op 0 allocs/op
BenchmarkRead256Bytes-2 20000000 100 ns/op 2574.03 MB/s 0 B/op 0 allocs/op
BenchmarkRead2048Bytes-2 3000000 510 ns/op 4014.73 MB/s 0 B/op 0 allocs/op
BenchmarkRead16StringAsBytes-2 30000000 40.2 ns/op 422.87 MB/s 0 B/op 0 allocs/op
BenchmarkRead256StringAsBytes-2 20000000 107 ns/op 2418.58 MB/s 0 B/op 0 allocs/op
BenchmarkRead16String-2 10000000 127 ns/op 133.53 MB/s 16 B/op 1 allocs/op
BenchmarkRead256String-2 5000000 266 ns/op 971.61 MB/s 256 B/op 1 allocs/op
BenchmarkReadComplex64-2 50000000 31.5 ns/op 317.30 MB/s 0 B/op 0 allocs/op
BenchmarkReadComplex128-2 50000000 38.7 ns/op 464.71 MB/s 0 B/op 0 allocs/op
BenchmarkReadTime-2 30000000 38.9 ns/op 385.45 MB/s 0 B/op 0 allocs/op
BenchmarkSkip-2 5000000 413 ns/op 360.76 MB/s 0 B/op 0 allocs/op
BenchmarkAppendMapHeader-2 200000000 8.02 ns/op 0 B/op 0 allocs/op
BenchmarkAppendArrayHeader-2 200000000 7.92 ns/op 0 B/op 0 allocs/op
BenchmarkAppendFloat64-2 100000000 12.2 ns/op 736.77 MB/s 0 B/op 0 allocs/op
BenchmarkAppendFloat32-2 100000000 10.4 ns/op 479.46 MB/s 0 B/op 0 allocs/op
BenchmarkAppendInt64-2 100000000 18.6 ns/op 0 B/op 0 allocs/op
BenchmarkAppendUint64-2 100000000 19.3 ns/op 0 B/op 0 allocs/op
BenchmarkAppend16Bytes-2 100000000 18.2 ns/op 1156.18 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256Bytes-2 50000000 27.2 ns/op 9584.15 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048Bytes-2 10000000 109 ns/op 18743.55 MB/s 0 B/op 0 allocs/op
BenchmarkAppend16String-2 100000000 16.7 ns/op 1254.41 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256String-2 50000000 28.5 ns/op 9154.87 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048String-2 20000000 81.2 ns/op 25293.70 MB/s 0 B/op 0 allocs/op
BenchmarkAppendBool-2 300000000 3.60 ns/op 277.61 MB/s 0 B/op 0 allocs/op
BenchmarkAppendTime-2 50000000 24.2 ns/op 620.99 MB/s 0 B/op 0 allocs/op
BenchmarkWriteMapHeader-2 200000000 8.64 ns/op 0 B/op 0 allocs/op
BenchmarkWriteArrayHeader-2 200000000 8.97 ns/op 0 B/op 0 allocs/op
BenchmarkWriteFloat64-2 100000000 14.4 ns/op 624.26 MB/s 0 B/op 0 allocs/op
BenchmarkWriteFloat32-2 100000000 12.2 ns/op 408.34 MB/s 0 B/op 0 allocs/op
BenchmarkWriteInt64-2 100000000 13.8 ns/op 652.07 MB/s 0 B/op 0 allocs/op
BenchmarkWriteUint64-2 100000000 13.9 ns/op 648.52 MB/s 0 B/op 0 allocs/op
BenchmarkWrite16Bytes-2 50000000 23.5 ns/op 0 B/op 0 allocs/op
BenchmarkWrite256Bytes-2 50000000 33.1 ns/op 0 B/op 0 allocs/op
BenchmarkWrite2048Bytes-2 20000000 86.8 ns/op 0 B/op 0 allocs/op
BenchmarkWriteTime-2 50000000 24.6 ns/op 610.03 MB/s 0 B/op 0 allocs/op
BenchmarkWriteReadFile-2 2000000 766 ns/op 105.71 MB/s
ok github.com/tinylib/msgp/msgp 374.252s
# PR 156
$ git checkout avoid-pointers-receivers
Switched to branch 'avoid-pointers-receivers'
Your branch is up-to-date with 'fork/avoid-pointers-receivers'.
$ git rev-parse HEAD
4416ec38a88dcd4b55b36ff34d92950d684edc1f
$ go install ./...
$ go generate ./...
======== MessagePack Code Generator =======
>>> Input: "defs_test.go"
>>> Wrote and formatted "defgen_test.go"
$ go test -v -cpu=2 ./... -bench .
# [All tests pass]
PASS
BenchmarkLocate-2 20000000 97.9 ns/op 531.23 MB/s 0 B/op 0 allocs/op
BenchmarkReadWriteFloat32-2 20000000 85.8 ns/op
BenchmarkReadWriteFloat64-2 20000000 81.5 ns/op
BenchmarkUnmarshalAsJSON-2 1000000 1891 ns/op 84.05 MB/s 16 B/op 1 allocs/op
BenchmarkCopyToJSON-2 1000000 2206 ns/op 72.05 MB/s 48 B/op 1 allocs/op
BenchmarkStdlibJSON-2 200000 7715 ns/op 22.03 MB/s 920 B/op 36 allocs/op
BenchmarkReadMapHeaderBytes-2 200000000 8.41 ns/op 356.77 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeaderBytes-2 200000000 7.77 ns/op 386.24 MB/s 0 B/op 0 allocs/op
BenchmarkReadNilByte-2 500000000 2.95 ns/op 339.48 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64Bytes-2 200000000 9.36 ns/op 961.76 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32Bytes-2 300000000 5.46 ns/op 916.55 MB/s 0 B/op 0 allocs/op
BenchmarkReadBoolBytes-2 200000000 6.66 ns/op 150.10 MB/s 0 B/op 0 allocs/op
BenchmarkReadTimeBytes-2 100000000 16.1 ns/op 930.48 MB/s 0 B/op 0 allocs/op
BenchmarkSkipBytes-2 10000000 171 ns/op 868.05 MB/s 0 B/op 0 allocs/op
BenchmarkReadMapHeader-2 100000000 20.3 ns/op 98.31 MB/s 0 B/op 0 allocs/op
BenchmarkReadArrayHeader-2 100000000 20.6 ns/op 96.97 MB/s 0 B/op 0 allocs/op
BenchmarkReadNil-2 100000000 16.6 ns/op 60.16 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat64-2 50000000 27.5 ns/op 327.17 MB/s 0 B/op 0 allocs/op
BenchmarkReadFloat32-2 50000000 24.0 ns/op 208.24 MB/s 0 B/op 0 allocs/op
BenchmarkReadInt64-2 50000000 35.9 ns/op 111.41 MB/s 0 B/op 0 allocs/op
BenchmarkReadUint64-2 50000000 27.7 ns/op 72.26 MB/s 0 B/op 0 allocs/op
BenchmarkRead16Bytes-2 30000000 49.9 ns/op 360.56 MB/s 0 B/op 0 allocs/op
BenchmarkRead256Bytes-2 10000000 137 ns/op 1878.40 MB/s 0 B/op 0 allocs/op
BenchmarkRead2048Bytes-2 2000000 547 ns/op 3743.05 MB/s 0 B/op 0 allocs/op
BenchmarkRead16StringAsBytes-2 30000000 46.2 ns/op 368.24 MB/s 0 B/op 0 allocs/op
BenchmarkRead256StringAsBytes-2 10000000 124 ns/op 2075.45 MB/s 0 B/op 0 allocs/op
BenchmarkRead16String-2 10000000 110 ns/op 154.20 MB/s 16 B/op 1 allocs/op
BenchmarkRead256String-2 5000000 256 ns/op 1010.92 MB/s 256 B/op 1 allocs/op
BenchmarkReadComplex64-2 50000000 37.3 ns/op 267.92 MB/s 0 B/op 0 allocs/op
BenchmarkReadComplex128-2 30000000 49.4 ns/op 364.74 MB/s 0 B/op 0 allocs/op
BenchmarkReadTime-2 50000000 40.7 ns/op 368.44 MB/s 0 B/op 0 allocs/op
BenchmarkSkip-2 3000000 398 ns/op 374.23 MB/s 0 B/op 0 allocs/op
BenchmarkAppendMapHeader-2 200000000 7.87 ns/op 0 B/op 0 allocs/op
BenchmarkAppendArrayHeader-2 200000000 7.82 ns/op 0 B/op 0 allocs/op
BenchmarkAppendFloat64-2 100000000 11.7 ns/op 768.19 MB/s 0 B/op 0 allocs/op
BenchmarkAppendFloat32-2 200000000 9.69 ns/op 515.96 MB/s 0 B/op 0 allocs/op
BenchmarkAppendInt64-2 100000000 20.2 ns/op 0 B/op 0 allocs/op
BenchmarkAppendUint64-2 100000000 18.9 ns/op 0 B/op 0 allocs/op
BenchmarkAppend16Bytes-2 100000000 19.2 ns/op 1093.52 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256Bytes-2 50000000 27.5 ns/op 9494.18 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048Bytes-2 20000000 109 ns/op 18832.78 MB/s 0 B/op 0 allocs/op
BenchmarkAppend16String-2 100000000 16.5 ns/op 1275.48 MB/s 0 B/op 0 allocs/op
BenchmarkAppend256String-2 50000000 26.6 ns/op 9803.78 MB/s 0 B/op 0 allocs/op
BenchmarkAppend2048String-2 20000000 84.7 ns/op 24227.11 MB/s 0 B/op 0 allocs/op
BenchmarkAppendBool-2 300000000 3.59 ns/op 278.51 MB/s 0 B/op 0 allocs/op
BenchmarkAppendTime-2 50000000 23.9 ns/op 626.37 MB/s 0 B/op 0 allocs/op
BenchmarkWriteMapHeader-2 200000000 8.52 ns/op 0 B/op 0 allocs/op
BenchmarkWriteArrayHeader-2 200000000 8.88 ns/op 0 B/op 0 allocs/op
BenchmarkWriteFloat64-2 100000000 13.9 ns/op 645.32 MB/s 0 B/op 0 allocs/op
BenchmarkWriteFloat32-2 100000000 11.3 ns/op 443.03 MB/s 0 B/op 0 allocs/op
BenchmarkWriteInt64-2 100000000 14.4 ns/op 624.81 MB/s 0 B/op 0 allocs/op
BenchmarkWriteUint64-2 100000000 13.1 ns/op 686.05 MB/s 0 B/op 0 allocs/op
BenchmarkWrite16Bytes-2 100000000 22.0 ns/op 0 B/op 0 allocs/op
BenchmarkWrite256Bytes-2 50000000 31.8 ns/op 0 B/op 0 allocs/op
BenchmarkWrite2048Bytes-2 20000000 84.9 ns/op 0 B/op 0 allocs/op
BenchmarkWriteTime-2 50000000 28.0 ns/op 535.39 MB/s 0 B/op 0 allocs/op
BenchmarkWriteReadFile-2 2000000 730 ns/op 110.88 MB/s
ok github.com/tinylib/msgp/msgp 300.308s |
(To show that the small differences in my 2 benchmarks are just variance, I did a second run with #156 : )
|
You need to benchmark the code in On Wed, May 11, 2016 at 9:24 PM, Hector Jusforgues <notifications@github.com
|
You'll also take a perf hit when you turn a value into On Wed, May 11, 2016 at 9:42 PM, Philip Hofer phofer@umich.edu wrote:
|
Ok, the benchmarks in _generated/ indeed show a difference. One more allocation for:
I'll see if I can improve that |
To be clear, my benchmark was of my own code, using my fork of msgp. I was unable to compare my code with the fork vs my code with mainline msgp since I was unable to get my code working with mainline. I didn't think to benchmark msgp in the fork on its own vs mainline. |
That's making me wonder if I missed something with mailru/easyjson#15 or if some difference in the implementations makes it efficient with easyjson but not with msgp... Gotta do some digging |
I'm not sure I understand what you mean, but just to make sure I'll clarify even more :) I benchmarked my own code using https://github.com/vmihailenco/msgpack vs my own code using a fork of msgp that just added shims to some more types. This benchmark showed the unreasonable result that the code became slower with msgp. This was unreasonable because msgpack used reflection, while msgp uses generated hard coded coders. This made me give up and forget all about it. TL;DR I don't believe msgp is slower than msgpack, and I don't necessarily think more indirection via pointers or shims in msgp would make things relevantly slower. |
@zond: oh, thanks for the clarification. To clarify too, my last comment was not about your observations but about the results I get from my benchmarks run. The non-pointer-receiver way is less efficient for msgp, but it did not seem to be for easyjson (which does something similar to msgp, just for faster json Marshaling/Unmarshaling) |
@hectorj Ah, thanks! Weird, please explain what caused it if you find out :) |
Closing for now, as I haven't been able to produce code with the same features & performances and non-pointers receivers for now, and I don't have enough time to keep trying. Thanks all for your inputs. |
I see from this part of the code that the generator prefer pointer receivers for struct with > 3 fields and arrays.
This seems unnecessary (this methods do not modify the data they operate on, so they do not require a pointer) and is inconvenient (I'd like to be able to Marshal my structs without referencing them, which possibly increases the generated garbage).
Folks at Easyjson accepted my PR after checking benchmarks. Would you accept something similar for the msgp generator?.
Or is there some benchmark showing that the use of a pointer receiver actually improves performances?
The text was updated successfully, but these errors were encountered: