-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: Use streaming UTF-8 + JSON decoding for the response #972
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I can reproduce the improvement on my machine, and StreamingJsonBenchmark
is consistently faster. For me, they perform about equally well on emoji_api.json.
$ dart compile exe bin/dart_bench.dart && ./bin/dart_bench.exe
/register.json StreamingJsonBenchmark(RunTime): 66810.2 us.
/register.json NonStreamingJsonBenchmark(RunTime): 90924.72727272728 us.
/emoji_api.json StreamingJsonBenchmark(RunTime): 1176.8817647058825 us.
/emoji_api.json NonStreamingJsonBenchmark(RunTime): 931.1215083798883 us.
Left a comment.
@@ -147,8 +147,8 @@ class ApiConnection { | |||
final int httpStatus = response.statusCode; | |||
Map<String, dynamic>? json; | |||
try { | |||
final bytes = await response.stream.toBytes(); | |||
json = jsonUtf8Decoder.convert(bytes) as Map<String, dynamic>?; | |||
final jsonStream = jsonUtf8Decoder.bind(response.stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be helpful to add a comment on why we do this here.
db07b2f
to
d19b5fd
Compare
Thanks for the review @PIG208! Pushed a new revision PTAL. |
LGTM! I guess eventually we might try to parse these in isolates, if we can figure out the issue with data transmission? (And/or, just trim the initial /register payload from the server side.) This change looks like a net improvement. |
Linking #695 here because it appears to be related. |
Before this change, the JSON response body was fully downloaded before decoding it into a Map using `jsonUtf8Decoder`. For larger response bodies, such as the `/register` endpoint on CZO (~14MB uncompressed) the CPU would often remain idle while waiting for the entire response to download before starting the decoding process. With this change, the response byte stream is now piped directly into the `jsonUtf8Decoder` stream transformer. This allows for decoding to begin as soon as the byte stream starts emitting chunks of data (max length of 65536 bytes (64KiB) observed in my local testing). Additionally, I ran some local benchmarks for which the code is available here: https://gist.github.com/rajveermalviya/7b4d92f84c68f0976ed07f6d797ac164 where the response bodies of `/register` and `/static/generated/emoji/emoji_api.xxx.json` were downloaded and served locally using Nginx configured using the following config (mimicking the Zulip server): https://gist.github.com/rajveermalviya/2188c9a8d1a3e21c2efea186d61026b2 The results were as follows: $ dart compile exe bin/dart_json_bench.dart && ./bin/dart_json_bench.exe /register.json StreamingJsonBenchmark(RunTime): 77548.42307692308 us. (~77ms) /register.json NonStreamingJsonBenchmark(RunTime): 116733.44444444444 us. (~116ms) /emoji_api.json StreamingJsonBenchmark(RunTime): 1109.8724348308374 us. (~1ms) /emoji_api.json NonStreamingJsonBenchmark(RunTime): 1138.2514220705348 us. (~1ms) (The durations represent the time taken to make a single request, calculated by averaging the total time over the number of iterations performed within a 2-second period.) In non-streaming mode for `/register.json` the UTF-8 + JSON decoder takes around ~55ms and rest ~61ms is utilized just to download the response. Meaning in the streaming mode, decoding only incurs extra ~16ms. This duration would be overshadowed by download time even more when running on a slower network (possibly even reaching to near zero) because bytes chunks would be downloaded much much slower than time it would take to decode the chunks. Though for smaller responses (e.g. `/emoji_api.json` ~60KiB) there isn't much difference observed, understandbly.
This is a bit off the main line of what a reader of this method will mostly want to know about (as the method is mainly about semantics, not performance optimization). So keep the comments fairly short, to avoid distracting the reader's focus from the rest of the method.
d19b5fd
to
061821c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat — thanks @rajveermalviya for the speedup and detailed benchmarks, and @PIG208 for the previous review.
The code and commit message look good, and it sounds like this makes a small but useful speedup: about 40ms in the benchmark (*).
Merging, with an added commit editing the comments:
061821c api [nfc]: Tighten comments about streaming JSON-plus-UTF-8 decoder
(*) If the exact performance behavior were important, we'd want to benchmark on a phone rather than a laptop. But here it seems unlikely that this is making a trade-off where the results could come out the other way on different hardware; the benchmark shows it is successfully pipelining decoding with downloading, and then it should be a speed-up everywhere.
Yeah, it isn't currently possible in Dart to usefully offload JSON parsing to another isolate. A chat thread would be the best place to discuss further; I thought there'd been at least a brief discussion somewhere, but I'm not finding it now. With the chunked decoding from this PR, though, I think there may not be much further gain to be had even if we could do so. In addition to the pipelining effect (decoding while downloading is still happening), the chunking means that each chunk should be quick to decode, so doing the decoding on the UI thread doesn't cause much risk of dropped frames. For a back-of-envelope estimate: the benchmark @rajveermalviya reported above on his laptop took 55ms to decode 14MB, which at a chunk size of 64 kiB is about 215 chunks. That makes about 0.26ms per chunk, whereas a frame is 8.3ms on the highest-end mobile hardware (and 11.1ms or 16.7ms on other phones).
Yeah, that's definitely something that'll be on our agenda post-launch. Previous discussion is in a number of |
(I think this was discussed in one of our weekly video chats.) |
Before this change, the JSON response body was fully downloaded before decoding it into a Map using
jsonUtf8Decoder
. For larger response bodies, such as the/register
endpoint on CZO (~14MB uncompressed) the CPU would often remain idle while waiting for the entire response to download before starting the decoding process.With this change, the response byte stream is now piped directly into the
jsonUtf8Decoder
stream transformer. This allows for decoding to begin as soon as the byte stream starts emitting chunks of data (max length of 65536 bytes (64KiB) observed in my local testing).Additionally, I ran some local benchmarks for which the code is available here where the response bodies of
/register
and/static/generated/emoji/emoji_api.xxx.json
were downloaded and served locally using Nginx configured using the following config (mimicking the Zulip server).The results were as follows:
(The durations represent the time taken to make a single request, calculated by averaging the total time over the number of iterations performed within a 2-second period.)
In non-streaming mode for
/register.json
the UTF-8 + JSON decoder takes around ~55ms and rest ~61ms is utilized just to download the response. Meaning in the streaming mode, decoding only incurs extra ~16ms. This duration would be overshadowed by download time even more when running on a slower network (possibly even reaching to near zero) because bytes chunks would be downloaded much much slower than time it would take to decode the chunks. Though for smaller responses (e.g./emoji_api.json
~60KiB) there isn't much difference observed, understandbly.