Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add msgpack handler for smarter JSON encoding #7154

Closed
wants to merge 1 commit into from

Conversation

jsternberg
Copy link
Contributor

The msgpack handler correctly retains information about whether the
number is a float or an integer, unlike JSON. While the format is not
human-readable, it makes a good interchange format for applications that
don't necessarily care about a human readable output.

@mention-bot
Copy link

@jsternberg, thanks for your PR! By analyzing the annotation information on this pull request, we identified @e-dard, @gunnaraasen and @joelegasse to be potential reviewers

@jsternberg
Copy link
Contributor Author

jsternberg commented Aug 15, 2016

Sample decoder in Ruby:

require "msgpack"
require "date"
require "excon"

r = Excon.post("http://localhost:8086/query",
               body: URI.encode_www_form(db: "mydb", q: "SELECT * FROM cpu"),
               headers: { "Accept" => "application/x-msgpack", "Content-Type" => "application/x-www-form-urlencoded" }
              )

u = MessagePack::Unpacker.new(StringIO.new(r.body))
u.register_type(0x01) { |data|
  nsecs = MessagePack.unpack(data)
  Time.at(nsecs/1000000000, (nsecs%1000000000)/1000.0)
}
u.each do |v|
  p v
end

@jwilder
Copy link
Contributor

jwilder commented Aug 15, 2016

I'm curious how much of performance improvement we'd get from this as well. According to these benchmarks, might be fairly significant for larger result sets.

@jsternberg
Copy link
Contributor Author

I had tried it at some point when I originally implemented it and the speed results weren't that great (sometimes worse). But, we should have proper benchmarks for all of these along with proper unit tests for the content types that aren't JSON.

I think the size was a decent amount less (like maybe a 10% decrease) but I'm trying to remember this from memory.

@jsternberg
Copy link
Contributor Author

Here's a few benchmarks of each response writer. Code is included in the PR so someone can check I got the code correct. I'm really bad at coding benchmarks.

BenchmarkJSONResponseWriter_10K-8            100          16942822 ns/op         3113681 B/op      40500 allocs/op
BenchmarkJSONResponseWriter_100K-8             5         212850496 ns/op        46085830 B/op     500008 allocs/op
BenchmarkMsgpackResponseWriter_10K-8         100          12753972 ns/op         6011245 B/op      71000 allocs/op
BenchmarkMsgpackResponseWriter_100K-8         10         134860933 ns/op        61120452 B/op     800002 allocs/op
BenchmarkCSVResponseWriter_10K-8             200          11019348 ns/op          502737 B/op      30400 allocs/op
BenchmarkCSVResponseWriter_100K-8             10         115106198 ns/op         9358752 B/op     380000 allocs/op

@jsternberg
Copy link
Contributor Author

I found that the code to encode the time was very slow. I optimized it and now I get these times:

jsternberg:[~/g/s/g/i/influxdb] $ (js-msgpack-handler *$=) go test -bench=. -run=nan ./services/httpd/
PASS
BenchmarkJSONResponseWriter_10K-8            100          17565719 ns/op         3113681 B/op      40500 allocs/op
BenchmarkJSONResponseWriter_100K-8             5         222098116 ns/op        46085830 B/op     500008 allocs/op
BenchmarkMsgpackResponseWriter_10K-8         200           6562621 ns/op          405625 B/op      20500 allocs/op
BenchmarkMsgpackResponseWriter_100K-8         20          74132411 ns/op         4560250 B/op     250001 allocs/op
BenchmarkCSVResponseWriter_10K-8             100          11748689 ns/op          525475 B/op      30800 allocs/op
BenchmarkCSVResponseWriter_100K-8             10         119437403 ns/op         9358752 B/op     380000 allocs/op

The only problem is the way that it is optimized does not allow for thread safety. We remake the codec for every response so this shouldn't be an issue, but I just wanted to say that in case there were some objections. I was able to get good times, but not as good, using a more threadsafe method.

The thread safety should only be a problem if we use this to register a global codec.

@jsternberg
Copy link
Contributor Author

Something to add to this that is confusing the hell out of me. When I isolate the handlers in benchmarks, the msgpack one performs much better. When I try to use it within the program itself, it performs worse. I just ran a test where it took msgpack 35 seconds to encode a message and json took 25 seconds.

@jsternberg
Copy link
Contributor Author

I think I finally figured it out. The go/codec library doesn't do buffering for you so it was writing directly to the HTTP stream while it was encoding a large message. The JSON one does buffering internally within the encoding/json library. Adding a buffered IO output to the codec before it writes to the ResponseWriter started giving me better performance.

This also explains why I couldn't find it in benchmarking. In benchmarking, I was always either discarding the output or it was being internally buffered with httptest. I wasn't using any sockets and didn't experience the negatives of writing directly to the socket.

@jsternberg
Copy link
Contributor Author

Found a few bugs while working on this in #7163. I'm going to wait for that to be merged and then merge it into master before this change should be considered for merging.

@jsternberg
Copy link
Contributor Author

jsternberg commented Sep 3, 2016

I switched this to use tinylib/msgp and I have new numbers.

BenchmarkMsgpackResponseWriter_1K-8        10000            124162 ns/op               8 B/op          1 allocs/op
BenchmarkMsgpackResponseWriter_100K-8        100          12963048 ns/op               8 B/op          1 allocs/op
BenchmarkMsgpackResponseWriter_1M-8           10         129973937 ns/op               8 B/op          1 allocs/op

When I tried it on a SELECT * FROM cpu query using the default influx_stress, I got about 60 seconds to return the full response. JSON takes about 85 seconds. Non-scientific, but it gives us a general idea. This is a huge optimization. Likely the bottleneck is the storage engine at this point or memory allocations from creating a new slice for every results struct that gets sent over the channel. We might be able to start playing with improving the memory allocations for that with reusable buffers or something else.

I did talk a bit with @goller though yesterday and I think this might be relevant while we're thinking about the new client API to decide if the current output format that is used in JSON is appropriate for msgpack. We may want to rethink the output format to be more friendly for chunking and consumption. I haven't thought of the best way to do that yet, but since we don't have any precedent with the msgpack encoding yet, we can likely change the output format for msgpack with no ill-effects.

The msgpack handler correctly retains information about whether the
number is a float or an integer, unlike JSON. While the format is not
human-readable, it makes a good interchange format for applications that
don't necessarily care about a human readable output.

This uses `github.com/tinylib/msgp` to precompile the marshaling for
`models.Row`. Since we only use this library to marshal the one type,
this is a much more efficient method of encoding than using reflection.
@jsternberg
Copy link
Contributor Author

This will be reopened as something else.

@jsternberg jsternberg closed this Sep 28, 2017
@jsternberg jsternberg deleted the js-msgpack-handler branch September 28, 2017 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants