Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark JVMs for OpenSearch #1276

Closed
CEHENKLE opened this issue Sep 16, 2021 · 30 comments
Closed

Benchmark JVMs for OpenSearch #1276

CEHENKLE opened this issue Sep 16, 2021 · 30 comments

Comments

@CEHENKLE
Copy link
Member

The JVM we're using is the one that we inherited, but we haven't validated that it's actually the best one to use.

The request is to benchmark different JVM versions with OpenSearch & make a plan to move to the highest-performing one. Please also consider support & terms when making a recommendation. For example, Oracle JDK has short LTS support for free users, longer LTS support for paid users. Adopt has very short support. AWS corretto offers longer support for select versions, etc.

Expected output: Benchmark information for OpenSearch on different available JVMs (similar to: https://www.optaplanner.org/blog/2021/09/15/HowMuchFasterIsJava17.html)

Once we've done our homework, we can move the project to the best JVM (if necessary).

@jcgraybill
Copy link

There's also an issue requesting distributions of OpenSearch that don't include a JDK (opensearch-project/opensearch-build#99). So this benchmarking should be done in a way that's reproducible & automated, so that anybody who wants to use a different JDK can understand objective performance gains and tradeoffs for it.

@minalsha
Copy link
Contributor

Pre-requisites for this issue:

  1. Running Performance E2E with ability compare test results across different versions of OpenSearch.
  2. Configure JVM version for builds and Performance tests and resolve issue: Lower CI Java Version to Java 11  opensearch-build#74.

@saratvemulapalli saratvemulapalli transferred this issue from opensearch-project/OpenSearch Sep 20, 2021
@CEHENKLE
Copy link
Member Author

Moving to Infra for prereqs.

@CEHENKLE
Copy link
Member Author

@minalsha @saratvemulapalli Hey -- pondering on this. If we're thinking of this as a one time (or infrequent) activity, do we need to wait for automation? Isn't this something we could do just using rally and different configs?

Thanks!
/C

@jcgraybill
Copy link

jcgraybill commented Sep 21, 2021

Let's start with an exploration using existing tools, and tie this into the build infrastructure at the end of the project. As long as the benchmarking uses Rally, and everything is scripted, it'll be straightforward enough to automate this when the time comes.

Let's also remove the dependency on opensearch-project/opensearch-build#74 by focusing on the runtime JVM, not the build JVM.

@minalsha
Copy link
Contributor

minalsha commented Sep 22, 2021

We should start with benchmarking min/core with Perf test suite and then pick the right JVM version.

@minalsha minalsha transferred this issue from opensearch-project/opensearch-build Sep 22, 2021
@minalsha
Copy link
Contributor

minalsha commented Oct 1, 2021

Benchmarking needs to be done against jdk8, jdk11, jdk14, jdk16 and jdk17

@ryanbogan
Copy link
Member

ryanbogan commented Oct 28, 2021

A team account has been created for testing

@ryanbogan
Copy link
Member

ryanbogan commented Nov 17, 2021

OpenSearch cluster instances have been launched but there is still an error when performing testing

@ryanbogan
Copy link
Member

Tests have begun and the results are being compiled. Metrics being recorded are Latency (ms), Throughput (req/s), and Operation Counts.

@reta
Copy link
Collaborator

reta commented Dec 6, 2021

Interesting findings [1] for Apache Solr deployments (which could be quite relevant to OpenSearch as well), quoting for completeness:

Just switched to @graalvm in our @ApacheSolr deployment (Just the JIT - so drop in JDK replacement) and observed a 15%ish drop in response times. For basically no effort.

I think benchmarking against GraalVM distributions could be an interesting experiment.

[1] https://twitter.com/karlstoney/status/1456616777325158405

@ryanbogan
Copy link
Member

ryanbogan commented Dec 10, 2021

Results for min distribution of OpenSearch 1.2:
Note: arm versions of JVM 14 and 15 are no longer available so I was only able to perform tests for the x64 versions.

@ryanbogan
Copy link
Member

Java 8: x64

Latency (ms)

Operation Type P50 P90 P99 P100
default 3.303 3.525 8.175 9.658
distance_amount_agg 2.102 2.344 2.673 31.113
index 1,764.9 2,407 3,243.2 5,989
range 253.2 262.5 272 286.7
autohisto_agg 266.1 275.3 295.4 303.9
date_histogram_agg 245.7 254.7 260.4 265.2

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.012 2.02 2.04
index 42,819.6 44,560.7 51,879.1
range 0.704 0.706 0.711
autohisto_agg 1.506 1.509 1.518
date_histogram_agg 1.506 1.509 1.519

@ryanbogan
Copy link
Member

Java 8: arm64

Latency (ms)

Operation Type P50 P90 P99 P100
default 2.867 3.06 7.386 8.075
distance_amount_agg 2.13 2.3 3.188 10.845
index 1,554.5 2,072.3 2,670.5 4,602.2
range 218 224.2 230.3 235.5
autohisto_agg 177.8 182.5 187.5 202.4
date_histogram_agg 172.9 177.3 182 184.3

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 49,191.9 50,967.1 57,144.2
range 0.704 0.706 0.712
autohisto_agg 1.507 1.511 1.522
date_histogram_agg 1.507 1.511 1.522

@ryanbogan
Copy link
Member

Java 14: x64

Latency (ms)

Operation Type P50 P90 P99 P100
default 3.629 6.877 7.668 36.033
distance_amount_agg 2.173 2.393 2.705 2.953
index 2,182.9 2,844.9 4,093.7 9,802.5
range 293.8 310.5 328.1 356.7
autohisto_agg 317 331.5 343.9 361.3
date_histogram_agg 300.9 317.1 326.1 330.9

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 35,204.8 36,919.6 43,315.3
range 0.704 0.706 0.711
autohisto_agg 1.505 1.508 1.516
date_histogram_agg 1.505 1.508 1.517

@ryanbogan
Copy link
Member

Java 15: x64

Latency (ms)

Operation Type P50 P90 P99 P100
default 5.678 5.97 6.196 6.517
distance_amount_agg 2.084 2.28 2.58 2.803
index 2,094.3 2,795.4 3,908.7 6,981
range 292.9 310.2 328.6 341.5
autohisto_agg 315.7 329.6 345.7 362.6
date_histogram_agg 289.5 308.3 314.8 316.7

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 36,476.6 38,123.5 45,032.4
range 0.704 0.706 0.711
autohisto_agg 1.505 1.508 1.518
date_histogram_agg 1.505 1.508 1.517

@ryanbogan
Copy link
Member

Java 17: x64

Latency (ms)

Operation Type P50 P90 P99 P100
default 3.235 3.422 3.915 4.53
distance_amount_agg 2.141 2.284 2.52 2.834
index 1,968.5 2,564.2 3,554.2 6,373.4
range 246.5 257.1 294.4 316
autohisto_agg 248.5 257.3 264.3 314.7
date_histogram_agg 240 249.8 259.3 299.5

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 38,915.1 40,663.8 46,821.8
range 0.704 0.706 0.712
autohisto_agg 1.506 1.509 1.519
date_histogram_agg 1.506 1.51 1.519

@ryanbogan
Copy link
Member

Java 17: arm64

Latency (ms)

Operation Type P50 P90 P99 P100
default 2.865 3.07 7.315 9.295
distance_amount_agg 2.012 2.165 2.498 2.92
index 1,526.8 2,067 2,741.9 6,001.7
range 354.6 366.6 380.7 393.8
autohisto_agg 208.1 217.4 221.4 247.4
date_histogram_agg 204.7 212.9 224.5 246.4

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 49,203.2 51,354.2 58,656.7
range 0.703 0.705 0.711
autohisto_agg 1.507 1.51 1.521
date_histogram_agg 1.507 1.51 1.521

@ryanbogan
Copy link
Member

Java 11: x64

Latency (ms)

Operation Type P50 P90 P99 P100
default 5.3 5.709 6.087 12.386
distance_amount_agg 2.103 2.258 2.402 2.871
index 2,107.2 2,781.3 3,958.1 7,339
range 281.3 315.8 326.8 329.8
date_histogram_agg 289.7 324.2 335 369.4
autohisto_agg 334 356.3 368.4 372.9

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 36,067.8 37,763.2 43,562
range 0.704 0.706 0.711
date_histogram_agg 1.505 1.508 1.517
autohisto_agg 1.505 1.507 1.515

@ryanbogan
Copy link
Member

Java 11: arm64

Latency (ms)

Operation Type P50 P90 P99 P100
default 2.934 3.111 5.687 9.47
distance_amount_agg 2.298 2.502 2.684 2.961
index 1,377.3 1,889 2,534.4 4,650.5
range 231.6 237.4 241.4 242.5
autohisto_agg 207.2 212.5 219.7 222.1
date_histogram_agg 205.7 211.5 219.8 224.4

Throughput (req/s)

Operation Type P0 P50 P100
default 3.02 3.03 3.058
distance_amount_agg 2.013 2.02 2.04
index 54,602 57,333.2 64,667.6
range 0.704 0.706 0.712
autohisto_agg 1.507 1.51 1.521
date_histogram_agg 1.507 1.51 1.52

@nknize
Copy link
Collaborator

nknize commented Dec 15, 2021

What's the hardware configuration for this test (OS, processor, RAM)? Can you run the benchmarks across different Operating Systems that may have variations in default glibc installations?

@andrross
Copy link
Member

andrross commented Dec 17, 2021

Just want to clarify, by "JVM versions" mentioned in the task description does it refer to finding the best distribution (i.e. Oracle vs Adopt vs Corretto, etc)? The test results posted above are for Java versions, and unless I missed, I don't see the distribution specified. As for Java versions, I think the obsolete non-LTS versions (14, 15) would be non starters for a runtime JVM due to the lack of security updates. The same reasoning might apply for Java 8 as well since it will be approaching end-of-life relatively soon and there are two more recent LTS versions now.

As for the automated benchmarks that I think is ultimately the goal, in #1647 we found a change in performance of the fetch phase that was not detected in the existing nyc_taxis benchmarks, so we should ensure that the workloads we develop use here have some coverage of the fetch phase to close that gap.

@ryanbogan
Copy link
Member

What's the hardware configuration for this test (OS, processor, RAM)? Can you run the benchmarks across different Operating Systems that may have variations in default glibc installations?
The OS is Linux. An m5.xlarge instance is used for x versions and m6g.xlarge is used for arm versions.

@jcgraybill
Copy link

Just want to clarify, by "JVM versions" mentioned in the task description does it refer to finding the best distribution (i.e. Oracle vs Adopt vs Corretto, etc)? The test results posted above are for Java versions, and unless I missed, I don't see the distribution specified. As for Java versions, I think the obsolete non-LTS versions (14, 15) would be non starters for a runtime JVM due to the lack of security updates. The same reasoning might apply for Java 8 as well since it will be approaching end-of-life relatively soon and there are two more recent LTS versions now.

Bingo. The output is to choose a jdk distribution based on available data. Performance is one of the data points, and maintenance/patching policy is going to be another big one. I think @CEHENKLE is going to continue that conversation in a new issue in January.

@Cai-Chen
Copy link

Cai-Chen commented Feb 21, 2022

What's the hardware configuration for this test (OS, processor, RAM)? Can you run the benchmarks across different Operating Systems that may have variations in default glibc installations?
The OS is Linux. An m5.xlarge instance is used for x versions and m6g.xlarge is used for arm versions.

Hey @ryanbogan, your benchmark results indicate arm64 has better performance than x64. Do you have specific JVM options for arm64? I am benchmarking OpenSearch 1.2.4 & Corretto JDK 17 in arm64 vs x64(r5.2xlarge vs r6g.2xlarge) using pmc and http_logs track, but my arm64 performed worse than x64. So maybe I miss some config for arm64.

@ryanbogan
Copy link
Member

@Cai-Chen I just used the base version found on the Oracle website for each JVM. The x64 version performs better than arm64 with JDK 17 based on my testing (4.53 vs. 9.295 millisecond latency), so I do not believe that you are missing any special configuration options.

@Cai-Chen
Copy link

Cai-Chen commented Mar 2, 2022

@Cai-Chen I just used the base version found on the Oracle website for each JVM. The x64 version performs better than arm64 with JDK 17 based on my testing (4.53 vs. 9.295 millisecond latency), so I do not believe that you are missing any special configuration options.

Thanks @ryanbogan , thats bad news for me 😢
BTW, what instance size are you using?

@ryanbogan
Copy link
Member

@Cai-Chen I just used the base version found on the Oracle website for each JVM. The x64 version performs better than arm64 with JDK 17 based on my testing (4.53 vs. 9.295 millisecond latency), so I do not believe that you are missing any special configuration options.

Thanks @ryanbogan , thats bad news for me 😢 BTW, what instance size are you using?

I used m5.xlarge for x64 and m6g.xlarge for arm64 @Cai-Chen

@minalsha
Copy link
Contributor

@ryanbogan whats tasks are pending to prevent closing this issue?

@ryanbogan
Copy link
Member

@minalsha I don't think there are any tasks left for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants