Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: Use streaming UTF-8 + JSON decoding for the response #972

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

rajveermalviya
Copy link
Collaborator

Before this change, the JSON response body was fully downloaded before decoding it into a Map using jsonUtf8Decoder. For larger response bodies, such as the /register endpoint on CZO (~14MB uncompressed) the CPU would often remain idle while waiting for the entire response to download before starting the decoding process.

With this change, the response byte stream is now piped directly into the jsonUtf8Decoder stream transformer. This allows for decoding to begin as soon as the byte stream starts emitting chunks of data (max length of 65536 bytes (64KiB) observed in my local testing).

Additionally, I ran some local benchmarks for which the code is available here where the response bodies of /register and
/static/generated/emoji/emoji_api.xxx.json were downloaded and served locally using Nginx configured using the following config (mimicking the Zulip server).

The results were as follows:

$ dart compile exe bin/dart_json_bench.dart && ./bin/dart_json_bench.exe
/register.json  StreamingJsonBenchmark(RunTime):    77548.42307692308 us. (~77ms)
/register.json  NonStreamingJsonBenchmark(RunTime): 116733.44444444444 us. (~116ms)

/emoji_api.json StreamingJsonBenchmark(RunTime):    1109.8724348308374 us. (~1ms)
/emoji_api.json NonStreamingJsonBenchmark(RunTime): 1138.2514220705348 us. (~1ms)

(The durations represent the time taken to make a single request, calculated by averaging the total time over the number of iterations performed within a 2-second period.)

In non-streaming mode for /register.json the UTF-8 + JSON decoder takes around ~55ms and rest ~61ms is utilized just to download the response. Meaning in the streaming mode, decoding only incurs extra ~16ms. This duration would be overshadowed by download time even more when running on a slower network (possibly even reaching to near zero) because bytes chunks would be downloaded much much slower than time it would take to decode the chunks. Though for smaller responses (e.g. /emoji_api.json ~60KiB) there isn't much difference observed, understandbly.

@rajveermalviya rajveermalviya added the maintainer review PR ready for review by Zulip maintainers label Sep 30, 2024
Copy link
Member

@PIG208 PIG208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I can reproduce the improvement on my machine, and StreamingJsonBenchmark is consistently faster. For me, they perform about equally well on emoji_api.json.

$ dart compile exe bin/dart_bench.dart && ./bin/dart_bench.exe
/register.json StreamingJsonBenchmark(RunTime): 66810.2 us.
/register.json NonStreamingJsonBenchmark(RunTime): 90924.72727272728 us.
/emoji_api.json StreamingJsonBenchmark(RunTime): 1176.8817647058825 us.
/emoji_api.json NonStreamingJsonBenchmark(RunTime): 931.1215083798883 us.

Left a comment.

@@ -147,8 +147,8 @@ class ApiConnection {
final int httpStatus = response.statusCode;
Map<String, dynamic>? json;
try {
final bytes = await response.stream.toBytes();
json = jsonUtf8Decoder.convert(bytes) as Map<String, dynamic>?;
final jsonStream = jsonUtf8Decoder.bind(response.stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be helpful to add a comment on why we do this here.

@rajveermalviya rajveermalviya force-pushed the pr-streaming-json-decode branch 2 times, most recently from db07b2f to d19b5fd Compare October 1, 2024 15:58
@rajveermalviya
Copy link
Collaborator Author

Thanks for the review @PIG208! Pushed a new revision PTAL.

@rajveermalviya rajveermalviya requested a review from PIG208 October 1, 2024 15:58
@PIG208
Copy link
Member

PIG208 commented Oct 1, 2024

LGTM! I guess eventually we might try to parse these in isolates, if we can figure out the issue with data transmission? (And/or, just trim the initial /register payload from the server side.) This change looks like a net improvement.

@PIG208 PIG208 added integration review Added by maintainers when PR may be ready for integration and removed maintainer review PR ready for review by Zulip maintainers labels Oct 1, 2024
@PIG208
Copy link
Member

PIG208 commented Oct 1, 2024

Linking #695 here because it appears to be related.

rajveermalviya and others added 2 commits October 1, 2024 22:31
Before this change, the JSON response body was fully downloaded before
decoding it into a Map using `jsonUtf8Decoder`. For larger response
bodies, such as the `/register` endpoint on CZO (~14MB uncompressed)
the CPU would often remain idle while waiting for the entire response
to download before starting the decoding process.

With this change, the response byte stream is now piped directly into
the `jsonUtf8Decoder` stream transformer. This allows for decoding to
begin as soon as the byte stream starts emitting chunks of data (max
length of 65536 bytes (64KiB) observed in my local testing).

Additionally, I ran some local benchmarks for which the code is
available here:
  https://gist.github.com/rajveermalviya/7b4d92f84c68f0976ed07f6d797ac164
where the response bodies of `/register` and
`/static/generated/emoji/emoji_api.xxx.json` were downloaded and
served locally using Nginx configured using the following config
(mimicking the Zulip server):
  https://gist.github.com/rajveermalviya/2188c9a8d1a3e21c2efea186d61026b2
The results were as follows:

  $ dart compile exe bin/dart_json_bench.dart && ./bin/dart_json_bench.exe
  /register.json  StreamingJsonBenchmark(RunTime):    77548.42307692308 us. (~77ms)
  /register.json  NonStreamingJsonBenchmark(RunTime): 116733.44444444444 us. (~116ms)

  /emoji_api.json StreamingJsonBenchmark(RunTime):    1109.8724348308374 us. (~1ms)
  /emoji_api.json NonStreamingJsonBenchmark(RunTime): 1138.2514220705348 us. (~1ms)

(The durations represent the time taken to make a single request,
calculated by averaging the total time over the number of iterations
performed within a 2-second period.)

In non-streaming mode for `/register.json` the UTF-8 + JSON decoder
takes around ~55ms and rest ~61ms is utilized just to download the
response. Meaning in the streaming mode, decoding only incurs extra
~16ms. This duration would be overshadowed by download time even more
when running on a slower network (possibly even reaching to near
zero) because bytes chunks would be downloaded much much slower than
time it would take to decode the chunks. Though for smaller responses
(e.g. `/emoji_api.json` ~60KiB) there isn't much difference observed,
understandbly.
This is a bit off the main line of what a reader of this method will
mostly want to know about (as the method is mainly about semantics,
not performance optimization).  So keep the comments fairly short,
to avoid distracting the reader's focus from the rest of the method.
@gnprice gnprice force-pushed the pr-streaming-json-decode branch from d19b5fd to 061821c Compare October 2, 2024 05:33
@gnprice gnprice merged commit 061821c into zulip:main Oct 2, 2024
1 check passed
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat — thanks @rajveermalviya for the speedup and detailed benchmarks, and @PIG208 for the previous review.

The code and commit message look good, and it sounds like this makes a small but useful speedup: about 40ms in the benchmark (*).

Merging, with an added commit editing the comments:
061821c api [nfc]: Tighten comments about streaming JSON-plus-UTF-8 decoder

(*) If the exact performance behavior were important, we'd want to benchmark on a phone rather than a laptop. But here it seems unlikely that this is making a trade-off where the results could come out the other way on different hardware; the benchmark shows it is successfully pipelining decoding with downloading, and then it should be a speed-up everywhere.

@gnprice
Copy link
Member

gnprice commented Oct 2, 2024

I guess eventually we might try to parse these in isolates, if we can figure out the issue with data transmission?

Yeah, it isn't currently possible in Dart to usefully offload JSON parsing to another isolate. A chat thread would be the best place to discuss further; I thought there'd been at least a brief discussion somewhere, but I'm not finding it now.

With the chunked decoding from this PR, though, I think there may not be much further gain to be had even if we could do so. In addition to the pipelining effect (decoding while downloading is still happening), the chunking means that each chunk should be quick to decode, so doing the decoding on the UI thread doesn't cause much risk of dropped frames.

For a back-of-envelope estimate: the benchmark @rajveermalviya reported above on his laptop took 55ms to decode 14MB, which at a chunk size of 64 kiB is about 215 chunks. That makes about 0.26ms per chunk, whereas a frame is 8.3ms on the highest-end mobile hardware (and 11.1ms or 16.7ms on other phones).

(And/or, just trim the initial /register payload from the server side.)

Yeah, that's definitely something that'll be on our agenda post-launch. Previous discussion is in a number of #api design threads a couple of months ago, starting here:
https://chat.zulip.org/#narrow/stream/378-api-design/near/1904593
and in this #mobile-team thread:
https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/shrinking.20the.20initial.20load/near/1907255

@rajveermalviya rajveermalviya deleted the pr-streaming-json-decode branch October 2, 2024 13:01
@chrisbobbe
Copy link
Collaborator

Yeah, it isn't currently possible in Dart to usefully offload JSON parsing to another isolate. A chat thread would be the best place to discuss further; I thought there'd been at least a brief discussion somewhere, but I'm not finding it now.

(I think this was discussed in one of our weekly video chats.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration review Added by maintainers when PR may be ready for integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants