Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search] Benchmark bfetch impact in cloud with http2 #124538

Closed
Dosant opened this issue Feb 3, 2022 · 6 comments
Closed

[Search] Benchmark bfetch impact in cloud with http2 #124538

Dosant opened this issue Feb 3, 2022 · 6 comments
Labels
Feature:Batching/Streaming Feature:Search Querying infrastructure in Kibana impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:medium Medium Level of Effort performance

Comments

@Dosant
Copy link
Contributor

Dosant commented Feb 3, 2022

  • Kibana in cloud uses http2 between client and cloud proxy by default. Because of connection reuse bfetch might be redundant in such setup
  • In [Search] Add configuration to not use /bsearch #122244 we added configuration to run a search without bfetch,
  • Now we would like to compare Kibana performance with bfetch and without bfetch in cloud with http2

if you're interested in support of http2 in Kibana: #123748

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:medium Medium Level of Effort impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Feb 3, 2022
@Dosant
Copy link
Contributor Author

Dosant commented Feb 9, 2022

Conditions

These are the simple measurements I took. Simply from my machine:

  • A dashboard with 13 Lenses and a map. Some Lens panels have a follow-up request (other bucket) 1 mln records of data
  • Opened two browser windows with cloud instance in us-west2 side by side. One with bfetch ON, another OFF. There is a http2 connection to cloud's proxy, so browser does all the search requests in parallel.
  • I am in Europe, so there is quite a bit of latency to us-west2
  • Set 10 seconds auto-interval and let them both run for 5 minutes
  • Only measured client side time for all the searches on the dashboard to complete. Time is from the first search started and till the last is finished. I didn't look at anything server-side.
  • I logged dashboard's search session completion time in event-log: 8.1 D/2022 02 04 search session benchmark 8.1 #124699 and used logged events to get the results

Results

In different measurements I noticed that bfetch:on on average is slightly faster for my example
The tiny difference in favor of bfetch:on is a constant result I get.

No network and cpu throttle

  • bfetch:on: Avg 2.323s, Med 2.259s
  • bfetch:off: Avg 2.348s, Med 2.278s

Fast 3G network and 4x cpu throttle

  • bfetch:on: Avg 5.474s, Med 5.252s
  • bfetch:off: Avg 5.517s, Med 5.273s

In this example we see that my dashboard is 1-2% faster with bfetch then without it and with http2 connection till cloud proxy.


Worthing noting, that internally bsearch route has an optimization that all searches share the search service instance.

const search = getScoped(request);

This, for example, reduces uiSettings access for each call: #108062 and maybe something else. Could it be interesting to measure without the optimization


Reminder, downsides of bfetch:

  • Custom gzip logic, theoretically less efficient then built in one
  • Harder to debug because of custom gzip logic. Harder for devs and support to work with HARs
  • Additional layer that adds complexity and more surface for bugs
  • No single requests cancelation

If my preliminary measurements are correct, then It doesn't look like bfetch worth it's benefit in http2 environment.
But I think we'd need more testing scenarios to be sure.

@mshustov, @ppisljar, @streamich, @lukasolson do you have any thoughts about these results?
Maybe you'd have an idea what else I should measure or improve in this testing?

@lukasolson
Copy link
Member

Awesome! Thanks for doing these benchmarks. Since we are moving "cloud first" and the gains with bfetch are not significant, I tend to lean towards removing bfetch for the reasons you listed.

@mshustov
Copy link
Contributor

Since we are moving "cloud first" and the gains with bfetch are not significant, I tend to lean towards removing bfetch for the reasons you listed.

We don't have to delete it. We can programmatically disable bfecth on Cloud. @Dosant already added an appropriate setting #123942 Even though, I think the setting should be deployment-based and configured by admins, instead of being user-based as implemented.

In this example we see that my dashboard is 1-2% faster with bfetch then without it and with http2 connection till cloud proxy.

It can be interesting to compare server-side metrics as well. Especially CPU load and event-loop delay. I have no idea what overhead bfetch adds, so it can be useful for understanding the whole picture.

@Dosant
Copy link
Contributor Author

Dosant commented Mar 28, 2022

It seems like currently Kibana has a sever bottleneck with simultaneous network requests (according to @lizozom's exploration).

Using @lizozom's testing script (https://github.com/elastic/kibana-capacity-test) against the same Kibana instance:

No bfetch, each search request is a separate network request:

requests per minute searches per minute avg. response time
10 10 713
20 20 735
40 40 751
80 80 680
160 160 750
320 320 979
640 640 3870
1280 1280 31685

With bfetch, but each bfetch request contains 10 search requests, so we get the same total amount of searches in total

requests per minute searches per minute avg. response time
1 10 872
2 20 912
4 40 792
8 80 957
16 160 890
32 320 790
64 640 990
128 1280 968
256 2560 3292
512 5120 13096
1024 10240 32901

So it seems that because of impact of larger number of separate network requests on Kibana server we shouldn't just ditch bfetch (which is mostly for developer experience purposes) until we figure out how to mitigate performance impact of separate network request

@Dosant
Copy link
Contributor Author

Dosant commented Mar 30, 2022

pausing for now. internal doc with summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Batching/Streaming Feature:Search Querying infrastructure in Kibana impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:medium Medium Level of Effort performance
Projects
None yet
Development

No branches or pull requests

4 participants