[Search] Benchmark `bfetch` impact in cloud with http2 #124538

Dosant · 2022-02-03T15:09:38Z

Kibana in cloud uses http2 between client and cloud proxy by default. Because of connection reuse bfetch might be redundant in such setup
In [Search] Add configuration to not use /bsearch #122244 we added configuration to run a search without bfetch,
Now we would like to compare Kibana performance with bfetch and without bfetch in cloud with http2

if you're interested in support of http2 in Kibana: #123748

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-02-03T15:09:40Z

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

Dosant · 2022-02-09T11:58:29Z

Conditions

These are the simple measurements I took. Simply from my machine:

A dashboard with 13 Lenses and a map. Some Lens panels have a follow-up request (other bucket) 1 mln records of data
Opened two browser windows with cloud instance in us-west2 side by side. One with bfetch ON, another OFF. There is a http2 connection to cloud's proxy, so browser does all the search requests in parallel.
I am in Europe, so there is quite a bit of latency to us-west2
Set 10 seconds auto-interval and let them both run for 5 minutes
Only measured client side time for all the searches on the dashboard to complete. Time is from the first search started and till the last is finished. I didn't look at anything server-side.
I logged dashboard's search session completion time in event-log: 8.1 D/2022 02 04 search session benchmark 8.1 #124699 and used logged events to get the results

Results

In different measurements I noticed that bfetch:on on average is slightly faster for my example
The tiny difference in favor of bfetch:on is a constant result I get.

No network and cpu throttle

bfetch:on: Avg 2.323s, Med 2.259s
bfetch:off: Avg 2.348s, Med 2.278s

Fast 3G network and 4x cpu throttle

bfetch:on: Avg 5.474s, Med 5.252s
bfetch:off: Avg 5.517s, Med 5.273s

In this example we see that my dashboard is 1-2% faster with bfetch then without it and with http2 connection till cloud proxy.

Worthing noting, that internally bsearch route has an optimization that all searches share the search service instance.

kibana/src/plugins/data/server/search/routes/bsearch.ts

Line 28 in ceb14e6

const search = getScoped(request);

This, for example, reduces uiSettings access for each call: #108062 and maybe something else. Could it be interesting to measure without the optimization

Reminder, downsides of bfetch:

Custom gzip logic, theoretically less efficient then built in one
Harder to debug because of custom gzip logic. Harder for devs and support to work with HARs
Additional layer that adds complexity and more surface for bugs
No single requests cancelation

If my preliminary measurements are correct, then It doesn't look like bfetch worth it's benefit in http2 environment.
But I think we'd need more testing scenarios to be sure.

@mshustov, @ppisljar, @streamich, @lukasolson do you have any thoughts about these results?
Maybe you'd have an idea what else I should measure or improve in this testing?

lukasolson · 2022-02-14T20:10:10Z

Awesome! Thanks for doing these benchmarks. Since we are moving "cloud first" and the gains with bfetch are not significant, I tend to lean towards removing bfetch for the reasons you listed.

mshustov · 2022-02-21T16:45:22Z

Since we are moving "cloud first" and the gains with bfetch are not significant, I tend to lean towards removing bfetch for the reasons you listed.

We don't have to delete it. We can programmatically disable bfecth on Cloud. @Dosant already added an appropriate setting #123942 Even though, I think the setting should be deployment-based and configured by admins, instead of being user-based as implemented.

In this example we see that my dashboard is 1-2% faster with bfetch then without it and with http2 connection till cloud proxy.

It can be interesting to compare server-side metrics as well. Especially CPU load and event-loop delay. I have no idea what overhead bfetch adds, so it can be useful for understanding the whole picture.

Dosant · 2022-03-28T13:21:41Z

It seems like currently Kibana has a sever bottleneck with simultaneous network requests (according to @lizozom's exploration).

Using @lizozom's testing script (https://github.com/elastic/kibana-capacity-test) against the same Kibana instance:

No bfetch, each search request is a separate network request:

requests per minute	searches per minute	avg. response time
10	10	713
20	20	735
40	40	751
80	80	680
160	160	750
320	320	979
640	640	3870
1280	1280	31685

With bfetch, but each bfetch request contains 10 search requests, so we get the same total amount of searches in total

requests per minute	searches per minute	avg. response time
1	10	872
2	20	912
4	40	792
8	80	957
16	160	890
32	320	790
64	640	990
128	1280	968
256	2560	3292
512	5120	13096
1024	10240	32901

So it seems that because of impact of larger number of separate network requests on Kibana server we shouldn't just ditch bfetch (which is mostly for developer experience purposes) until we figure out how to mitigate performance impact of separate network request

Dosant · 2022-03-30T15:01:16Z

pausing for now. internal doc with summary.

Dosant added Feature:Search Querying infrastructure in Kibana Team:AppServicesSv Feature:Batching/Streaming labels Feb 3, 2022

mshustov mentioned this issue Feb 8, 2022

[spacetime] Http/2 support #123748

Closed

5 tasks

mshustov added the performance label Feb 21, 2022

exalate-issue-sync bot closed this as completed Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search] Benchmark `bfetch` impact in cloud with http2 #124538

[Search] Benchmark `bfetch` impact in cloud with http2 #124538

Dosant commented Feb 3, 2022

elasticmachine commented Feb 3, 2022

Dosant commented Feb 9, 2022 •

edited

Loading

lukasolson commented Feb 14, 2022

mshustov commented Feb 21, 2022

Dosant commented Mar 28, 2022 •

edited

Loading

Dosant commented Mar 30, 2022

[Search] Benchmark bfetch impact in cloud with http2 #124538

[Search] Benchmark bfetch impact in cloud with http2 #124538

Comments

Dosant commented Feb 3, 2022

elasticmachine commented Feb 3, 2022

Dosant commented Feb 9, 2022 • edited Loading

Conditions

Results

No network and cpu throttle

Fast 3G network and 4x cpu throttle

lukasolson commented Feb 14, 2022

mshustov commented Feb 21, 2022

Dosant commented Mar 28, 2022 • edited Loading

No bfetch, each search request is a separate network request:

With bfetch, but each bfetch request contains 10 search requests, so we get the same total amount of searches in total

Dosant commented Mar 30, 2022

[Search] Benchmark `bfetch` impact in cloud with http2 #124538

[Search] Benchmark `bfetch` impact in cloud with http2 #124538

Dosant commented Feb 9, 2022 •

edited

Loading

Dosant commented Mar 28, 2022 •

edited

Loading