Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] InitialCodeCacheSize=4096 causing ARM docker image to crashloop #255

Open
ggee opened this issue Dec 8, 2022 · 5 comments
Open

[BUG] InitialCodeCacheSize=4096 causing ARM docker image to crashloop #255

ggee opened this issue Dec 8, 2022 · 5 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@ggee
Copy link

ggee commented Dec 8, 2022

For the opensearch container image running on ARM, the container gets into crashloop in kubernetes and also fails when running locally,

$ kubectl logs -f opensearch-cluster-master-0
Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Enabling OpenSearch Security Plugin
Killing opensearch process 10
OpenSearch exited with code 143
Performance analyzer exited with code 1

Further investigation found this in the performce-analyzer.log.

uintx InitialCodeCacheSize=4096 is outside the allowed range [ 65536 ... 18446744073709551615 ]
Improperly specified VM option 'InitialCodeCacheSize=4096'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Hacking the container and modifying the opensearch-performance-analyzer/performance-analyzer-agent-cli and changing the value to the minimum 65536, the opensearch container was able to start. When using HELM, I tested with

config:
  opensearch.yml: |
    cluster.name: opensearch-cluster
    network.host: 0.0.0.0
    plugins.security.disabled: true

I also tested with a basic docker-compose to run locally and saw same crash.

I could not find a way to override this using docker env settings or helm charts. PA_AGENT_JAVA_OPTS is not exposed and cannot be overridden.

Both a proper fix and a way to override would be useful.

@ggee ggee changed the title InitialCodeCacheSize=4096 causing ARM docker image to crashloop [BUG] InitialCodeCacheSize=4096 causing ARM docker image to crashloop Dec 8, 2022
@kkhatua
Copy link
Member

kkhatua commented Dec 20, 2022

@ggee why is this a bug? You're setting the cache to just 4KB, while the minimum is 64KB, which is what the error message says. Are you saying the current default setting in the source code itself is 4KB?
I see this line:
https://github.com/opensearch-project/performance-analyzer-rca/blame/a4e7e9a145b7ac7c8910320586837c29e23c1931/build.gradle#L34

This might be a JDK specific limit. Can you provide the JDK details?

@ggee
Copy link
Author

ggee commented Jan 14, 2023

@kkhatua I am not compiling or running locally. I am just pulling the container image opensearchproject/opensearch:2.3.0 and running it. The only JDK detail I can give is that it is ARM.

@ggee
Copy link
Author

ggee commented Jan 30, 2023

Is there more details required? I can try and dissect the container image that was pushed to Docker hub.

@kkhatua
Copy link
Member

kkhatua commented Feb 6, 2023

@ggee

Sorry for the delay. Since this is an issue you are facing repeatedly, I'd recommend you see if bumping to a minimum of 64KB is the only way around. A jump from 4KB to 64KB is a lot, although 64KB by itself is not much. I don't expect increasing this to break anything, but there might be a perf impact. Hence, I'd recommend increasing it to a level that just fixes the issue and create a PR to merge that as the fix.

Will mark you as the assignee for now. Let us know if you cannot and we'll do the change once you confirm the minimum code cache size to resolve this.

@kkhatua kkhatua added the good first issue Good for newcomers label Feb 6, 2023
@ggee
Copy link
Author

ggee commented Mar 1, 2023

I cannot work on this at the moment. As for the minimum code cache, the original error message printed by opensearch gave the allowed values.

uintx InitialCodeCacheSize=4096 is outside the allowed range [ 65536 ... 18446744073709551615 ]

As mentioned, this is from using the opensaerch container image that is downloaded from dockerhub. I did not build and package myself. I do not even know what JVM was installed. This seems specific to the ARM JVM that your team had chosen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants