-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Indexing Performance Degraded in OpenSearch 1.3.+ #2916
Comments
ES 7.10.2 shipped with JDK15 which would have been using G1GC by default. The switch in opensearch to JDK11 changes that default garbage collector which is a pretty significant change IMO. I am testing G1GC on JDK11 now. |
cc @nknize |
@mattweber thanks a lot for digging in. I agree with you, it is definitely worth reevaluating GC recommendations. We have this long standing issue [1] to run the benchmarks on different JVMs but it has not been finalized yet ([2] is also very helpful source). AFAIK we run into issues with security plugin on JDK-17 (see please [3]), that is one of the reasons the previous LTS (JDK-11) was set as the bundled one for 1.3.x. [1] #1276 |
@nknize @dblock a bit of context on the issue, the 1.2.4 was shipped with JDK-15:
Whereas 1.3.0 uses LTS (JDK-11):
Using latest patched LTS JDK is surely a good idea, but our default configuration uses CMS up to JDK-13 and G1 after JDK-14.
We probably have 3 options (at least):
I think going with 1st option would make sense, any options guys? [1] #1276 |
hrm.. this is quite the mess. 1.2.4 bundled jdk was inherited at JDK 15. It looks like 1.x was direct pushed a downgrade to JDK 11 when problems were discovered w/ the jdk 17 upgrade? We should've simply reverted back to 15 for the inherited bundled jdk. Now we have 1.3 on an older bundled jdk than 1.2? Probably not the best idea. What's the justification for going back to 11 as the bundled jdk when we started w/ 15? I think we should bundle 15 (especially given all the issues w/ different 11 versions) and can still limit min runtime as 11 (required by lucene 9). |
In core, there were no issues with JDK-17 (afaik)
Yes, but JDK-15 has not received any security patches for ages (technically, version is higher but not safer). It was supposed to be replaced by JDK-17 in first place.
The opensearch-project/security#1653 I think was a reason
That's an option for 1.x, for 2.x we are on JDK-17. |
I am personally running my clusters with option 1 as of right now but would be fine with running JDK15+ as well. I 100% think the default should be G1GC, the difference between using CMS and G1GC was massive. |
Yes... I should've been clearer that I was talking about 1.3.x bug fix. jdk17 for 2.0+ is definitely where we want to stay.
+1 Let's do this as a PR on main and backport all the way to 1.3 for a 1.3 bugfix release? Separately I'm fine keeping 1.3 on 11 (especially since we're moving forward w/ 2.0+ on 17). 1.3 is set to 11.0.14.1+1 anyway which isn't suffering from the same OOM issues described in #2791. |
@anasalkouz @kotwanikunal @bbarani Ugh. I'd really like to understand how we missed this in performance testing, and how we can be confident we won't miss something like this in the future. |
Looks like all the backports succeeded and were merged. Thanks for your help @mattweber. Closing. |
@CEHENKLE I have opened an issue here - #2985 |
@reta I think we encounter the same issue with OpenSearch 1.3.0. |
@HugoKuo good question, Help charts have [1] https://github.com/opensearch-project/helm-charts/blob/1495a00017ca54b43857173dd901553eba32f16f/charts/opensearch/README.md#configuration |
Updated the yml but failed with log output.
The change is the yaml
|
@HugoKuo probably the second option with overriding |
The change is made and seeing a huge difference. OpenSearch 1.3 on k8s by Helm ChartTo apply the G1GC collector is a bit tricky for helm chart. By adding the config map may or may not work. We updated the helm chart to obtain the config/jvm.options . I’m not Java people hence it takes awhile for me to apply the change. One key point is the number in the fron of each line. That means which JDK version to apply the line. The default is 14 for G1CG. OpenSearch 1.3 uses JDK 11. Hence you need to change the version range for the G1CG otherwise, it’ll go SerialGC which has very poor performance.
Updated the same on https://discuss.opendistrocommunity.dev/t/slow-indexing-performance/9366/6?u=hugok Hope this help. Thanks @reta |
Describe the bug
I have migrated a large project from elasticsearch 7.16.2 to opensearch 1.3.0 recently and have noticed a large drop in indexing performance. Both are same cluster size specs, same index mapping, same set of docs. Other than checking lucene versions, both are on 8.10.1, I did not really dig in much yet due to other priorities related to the migration.
Other users are reporting similar performance issues on upgrade from opensearch 1.2.4 to 1.3.1 and have potentially narrowed it down to JDK11.
I will attempt to debug this further, but I wanted to open this issue in case anyone else has any ideas as to what might be the issue.
To Reproduce
Compare indexing performance of elasticsearch 7.16.2 or opensearch 1.2.4 to opensearch 1.3.+.
Expected behavior
Indexing performance should be roughly the same.
Plugins
I do use a custom tokenizer which is not open source. This same tokenizer is used in elasticsearch and is unlikely the cause.
Screenshots
n/a
Host/Environment (please complete the following information):
Running on using multiarch docker image on aarch64 (AWS r6gd.4xlarge, 30g heaps)
Additional context
Unfortunately this is part of a very large migration from elasticsearch to opensearch and a lot of variables to consider.
The text was updated successfully, but these errors were encountered: