api: Use streaming UTF-8 + JSON decoding for the response #972

rajveermalviya · 2024-09-30T17:05:14Z

Before this change, the JSON response body was fully downloaded before decoding it into a Map using jsonUtf8Decoder. For larger response bodies, such as the /register endpoint on CZO (~14MB uncompressed) the CPU would often remain idle while waiting for the entire response to download before starting the decoding process.

With this change, the response byte stream is now piped directly into the jsonUtf8Decoder stream transformer. This allows for decoding to begin as soon as the byte stream starts emitting chunks of data (max length of 65536 bytes (64KiB) observed in my local testing).

Additionally, I ran some local benchmarks for which the code is available here where the response bodies of /register and
/static/generated/emoji/emoji_api.xxx.json were downloaded and served locally using Nginx configured using the following config (mimicking the Zulip server).

The results were as follows:

$ dart compile exe bin/dart_json_bench.dart && ./bin/dart_json_bench.exe
/register.json  StreamingJsonBenchmark(RunTime):    77548.42307692308 us. (~77ms)
/register.json  NonStreamingJsonBenchmark(RunTime): 116733.44444444444 us. (~116ms)

/emoji_api.json StreamingJsonBenchmark(RunTime):    1109.8724348308374 us. (~1ms)
/emoji_api.json NonStreamingJsonBenchmark(RunTime): 1138.2514220705348 us. (~1ms)

(The durations represent the time taken to make a single request, calculated by averaging the total time over the number of iterations performed within a 2-second period.)

In non-streaming mode for /register.json the UTF-8 + JSON decoder takes around ~55ms and rest ~61ms is utilized just to download the response. Meaning in the streaming mode, decoding only incurs extra ~16ms. This duration would be overshadowed by download time even more when running on a slower network (possibly even reaching to near zero) because bytes chunks would be downloaded much much slower than time it would take to decode the chunks. Though for smaller responses (e.g. /emoji_api.json ~60KiB) there isn't much difference observed, understandbly.

PIG208

Looks good! I can reproduce the improvement on my machine, and StreamingJsonBenchmark is consistently faster. For me, they perform about equally well on emoji_api.json.

$ dart compile exe bin/dart_bench.dart && ./bin/dart_bench.exe
/register.json StreamingJsonBenchmark(RunTime): 66810.2 us.
/register.json NonStreamingJsonBenchmark(RunTime): 90924.72727272728 us.
/emoji_api.json StreamingJsonBenchmark(RunTime): 1176.8817647058825 us.
/emoji_api.json NonStreamingJsonBenchmark(RunTime): 931.1215083798883 us.

Left a comment.

PIG208 · 2024-09-30T21:54:20Z

lib/api/core.dart

@@ -147,8 +147,8 @@ class ApiConnection {
    final int httpStatus = response.statusCode;
    Map<String, dynamic>? json;
    try {
-      final bytes = await response.stream.toBytes();
-      json = jsonUtf8Decoder.convert(bytes) as Map<String, dynamic>?;
+      final jsonStream = jsonUtf8Decoder.bind(response.stream);


Perhaps it would be helpful to add a comment on why we do this here.

rajveermalviya · 2024-10-01T15:58:49Z

Thanks for the review @PIG208! Pushed a new revision PTAL.

PIG208 · 2024-10-01T17:49:40Z

LGTM! I guess eventually we might try to parse these in isolates, if we can figure out the issue with data transmission? (And/or, just trim the initial /register payload from the server side.) This change looks like a net improvement.

PIG208 · 2024-10-01T18:04:12Z

Linking #695 here because it appears to be related.

Before this change, the JSON response body was fully downloaded before decoding it into a Map using `jsonUtf8Decoder`. For larger response bodies, such as the `/register` endpoint on CZO (~14MB uncompressed) the CPU would often remain idle while waiting for the entire response to download before starting the decoding process. With this change, the response byte stream is now piped directly into the `jsonUtf8Decoder` stream transformer. This allows for decoding to begin as soon as the byte stream starts emitting chunks of data (max length of 65536 bytes (64KiB) observed in my local testing). Additionally, I ran some local benchmarks for which the code is available here: https://gist.github.com/rajveermalviya/7b4d92f84c68f0976ed07f6d797ac164 where the response bodies of `/register` and `/static/generated/emoji/emoji_api.xxx.json` were downloaded and served locally using Nginx configured using the following config (mimicking the Zulip server): https://gist.github.com/rajveermalviya/2188c9a8d1a3e21c2efea186d61026b2 The results were as follows: $ dart compile exe bin/dart_json_bench.dart && ./bin/dart_json_bench.exe /register.json StreamingJsonBenchmark(RunTime): 77548.42307692308 us. (~77ms) /register.json NonStreamingJsonBenchmark(RunTime): 116733.44444444444 us. (~116ms) /emoji_api.json StreamingJsonBenchmark(RunTime): 1109.8724348308374 us. (~1ms) /emoji_api.json NonStreamingJsonBenchmark(RunTime): 1138.2514220705348 us. (~1ms) (The durations represent the time taken to make a single request, calculated by averaging the total time over the number of iterations performed within a 2-second period.) In non-streaming mode for `/register.json` the UTF-8 + JSON decoder takes around ~55ms and rest ~61ms is utilized just to download the response. Meaning in the streaming mode, decoding only incurs extra ~16ms. This duration would be overshadowed by download time even more when running on a slower network (possibly even reaching to near zero) because bytes chunks would be downloaded much much slower than time it would take to decode the chunks. Though for smaller responses (e.g. `/emoji_api.json` ~60KiB) there isn't much difference observed, understandbly.

This is a bit off the main line of what a reader of this method will mostly want to know about (as the method is mainly about semantics, not performance optimization). So keep the comments fairly short, to avoid distracting the reader's focus from the rest of the method.

gnprice

Neat — thanks @rajveermalviya for the speedup and detailed benchmarks, and @PIG208 for the previous review.

The code and commit message look good, and it sounds like this makes a small but useful speedup: about 40ms in the benchmark (*).

Merging, with an added commit editing the comments:
061821c api [nfc]: Tighten comments about streaming JSON-plus-UTF-8 decoder

(*) If the exact performance behavior were important, we'd want to benchmark on a phone rather than a laptop. But here it seems unlikely that this is making a trade-off where the results could come out the other way on different hardware; the benchmark shows it is successfully pipelining decoding with downloading, and then it should be a speed-up everywhere.

gnprice · 2024-10-02T05:45:08Z

I guess eventually we might try to parse these in isolates, if we can figure out the issue with data transmission?

Yeah, it isn't currently possible in Dart to usefully offload JSON parsing to another isolate. A chat thread would be the best place to discuss further; I thought there'd been at least a brief discussion somewhere, but I'm not finding it now.

With the chunked decoding from this PR, though, I think there may not be much further gain to be had even if we could do so. In addition to the pipelining effect (decoding while downloading is still happening), the chunking means that each chunk should be quick to decode, so doing the decoding on the UI thread doesn't cause much risk of dropped frames.

For a back-of-envelope estimate: the benchmark @rajveermalviya reported above on his laptop took 55ms to decode 14MB, which at a chunk size of 64 kiB is about 215 chunks. That makes about 0.26ms per chunk, whereas a frame is 8.3ms on the highest-end mobile hardware (and 11.1ms or 16.7ms on other phones).

(And/or, just trim the initial /register payload from the server side.)

Yeah, that's definitely something that'll be on our agenda post-launch. Previous discussion is in a number of #api design threads a couple of months ago, starting here:
https://chat.zulip.org/#narrow/stream/378-api-design/near/1904593
and in this #mobile-team thread:
https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/shrinking.20the.20initial.20load/near/1907255

chrisbobbe · 2024-10-05T01:05:37Z

Yeah, it isn't currently possible in Dart to usefully offload JSON parsing to another isolate. A chat thread would be the best place to discuss further; I thought there'd been at least a brief discussion somewhere, but I'm not finding it now.

(I think this was discussed in one of our weekly video chats.)

rajveermalviya added the maintainer review PR ready for review by Zulip maintainers label Sep 30, 2024

PIG208 reviewed Sep 30, 2024

View reviewed changes

rajveermalviya force-pushed the pr-streaming-json-decode branch 2 times, most recently from db07b2f to d19b5fd Compare October 1, 2024 15:58

rajveermalviya requested a review from PIG208 October 1, 2024 15:58

PIG208 added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Oct 1, 2024

rajveermalviya and others added 2 commits October 1, 2024 22:31

gnprice force-pushed the pr-streaming-json-decode branch from d19b5fd to 061821c Compare October 2, 2024 05:33

gnprice merged commit 061821c into zulip:main Oct 2, 2024
1 check passed

gnprice reviewed Oct 2, 2024

View reviewed changes

rajveermalviya deleted the pr-streaming-json-decode branch October 2, 2024 13:01

gnprice mentioned this pull request Nov 23, 2024

☂️ Upstream Dart issues affecting Zulip #1079

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api: Use streaming UTF-8 + JSON decoding for the response #972

api: Use streaming UTF-8 + JSON decoding for the response #972

rajveermalviya commented Sep 30, 2024

PIG208 left a comment •

edited

Loading

PIG208 Sep 30, 2024

rajveermalviya commented Oct 1, 2024

PIG208 commented Oct 1, 2024 •

edited

Loading

PIG208 commented Oct 1, 2024

gnprice left a comment

gnprice commented Oct 2, 2024

chrisbobbe commented Oct 5, 2024

api: Use streaming UTF-8 + JSON decoding for the response #972

api: Use streaming UTF-8 + JSON decoding for the response #972

Conversation

rajveermalviya commented Sep 30, 2024

PIG208 left a comment • edited Loading

Choose a reason for hiding this comment

PIG208 Sep 30, 2024

Choose a reason for hiding this comment

rajveermalviya commented Oct 1, 2024

PIG208 commented Oct 1, 2024 • edited Loading

PIG208 commented Oct 1, 2024

gnprice left a comment

Choose a reason for hiding this comment

gnprice commented Oct 2, 2024

chrisbobbe commented Oct 5, 2024

PIG208 left a comment •

edited

Loading

PIG208 commented Oct 1, 2024 •

edited

Loading