Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] LocalTestCluster fails during create and destroy #429

Closed
setiah opened this issue Sep 9, 2021 · 8 comments · Fixed by #431 or #516
Closed

[BUG] LocalTestCluster fails during create and destroy #429

setiah opened this issue Sep 9, 2021 · 8 comments · Fixed by #431 or #516
Assignees
Labels
beta Issues specific to the OpenSearch Beta bug Something isn't working

Comments

@setiah
Copy link
Contributor

setiah commented Sep 9, 2021

Describe the bug
The LocalTestCluster is not working with recent opensearch tarballs and fails to launch the OpenSearch process successfully. There are a couple of sub-issues here

  1. The OpenSearch process is not setup properly.
  2. The PID of the OpenSearch process seems to be reported incorrectly. It is the pid of shell subprocess that starts OpenSearch process.
  3. Missing logs to debug the opensearch process setup failures. The opensearch-tar-install.sh script logs are not available in stdout for debugging failure.

To Reproduce
Steps to reproduce the behavior:

  1. Stub the download() method and provide a pre-downloaded and uncompressed tarball. Then use LocalTestCluster create() method to spin the cluster. It is the create() method where problem is seen.

Expected behavior
OpenSearch cluster should setup correctly and querying on port 9200 should respond. The opensearch logs should be reported on stdout for debugging.

Plugins
Full bundle - ran test with security.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: Linux
  • Version = 1.1.0
@setiah setiah added bug Something isn't working untriaged Issues that have not yet been triaged beta Issues specific to the OpenSearch Beta labels Sep 9, 2021
@gaiksaya
Copy link
Member

gaiksaya commented Sep 9, 2021

Regarding # 2 The PID of the OpenSearch process seems to be reported incorrectly. It is the pid of shell subprocess that starts OpenSearch process.
This issue is resolved in #237
If you kill the main process the subprocess gets killed automatically

@setiah
Copy link
Contributor Author

setiah commented Sep 9, 2021

@gaiksaya The issue is not about zombie process, but incorrect logging of pid for OpenSearch process. the pid reported is not of the OpenSearch process but parent shell process, and needs to be fixed in logs.

@setiah
Copy link
Contributor Author

setiah commented Sep 14, 2021

For point 2 - The destroy() would fail to cleanup because the self.process pid is for parent shell process and not opensearch process. This leads to improper cleanup and failure in subsequent tests. We need to fix the pid to report the right OpenSearch pid.

@setiah setiah reopened this Sep 14, 2021
@gaiksaya
Copy link
Member

Hi @setiah I tested it in my local and it works properly. Are you sure you are using the right bundle? It kills the PPID and subsequent processes as well

@setiah setiah changed the title [BUG] LocalTestCluster fails during create [BUG] LocalTestCluster fails during create and destroy Sep 14, 2021
@setiah
Copy link
Contributor Author

setiah commented Sep 14, 2021

Hi @setiah I tested it in my local and it works properly. Are you sure you are using the right bundle? It kills the PPID and subsequent processes as well

yes. I am testing this on a linux machine. By local, do you mean mac? @gaiksaya

@gaiksaya
Copy link
Member

Yes on macos

@peternied peternied removed the untriaged Issues that have not yet been triaged label Sep 14, 2021
@gaiksaya
Copy link
Member

gaiksaya commented Sep 15, 2021

The PID is captured correctly for the linux (tested on amazon linux 2):

2021-09-15 01:41:21 INFO     Started OpenSearch with parent PID 15841
2021-09-15 01:41:21 INFO     Waiting for service to become available
2021-09-15 01:41:21 INFO     Pinging https://localhost:9200/_cluster/health attempt 0
% ps -ef | grep opensearch
gaiksaya 15841     1 22 01:41 pts/1    00:04:16 /tmp/tmplb65ybog/local-test-cluster/opensearch-1.1.0/jdk/bin/java -Xshare:auto -Dopensearch.networkaddress.cache.ttl=60 -Dopensearch.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms1g -Xmx1g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/tmp/opensearch-2089336390765732826 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Dclk.tck=100 -Djdk.attach.allowAttachSelf=true -Djava.security.policy=/tmp/tmplb65ybog/local-test-cluster/opensearch-1.1.0/plugins/opensearch-performance-analyzer/pa_config/opensearch_security.policy -XX:MaxDirectMemorySize=536870912 -Dopensearch.path.home=/tmp/tmplb65ybog/local-test-cluster/opensearch-1.1.0 -Dopensearch.path.conf=/tmp/tmplb65ybog/local-test-cluster/opensearch-1.1.0/config -Dopensearch.distribution.type=tar -Dopensearch.bundled_jdk=true -cp /tmp/tmplb65ybog/local-test-cluster/opensearch-1.1.0/lib/* org.opensearch.bootstrap.OpenSearch

Investigating more

@setiah
Copy link
Contributor Author

setiah commented Sep 15, 2021

I can reproduce the issue on Ubuntu 20.04 distribution

run_integ_test.py logs indicate the PID of the parent shell, which is different from the child OpenSearch process. Terminating the parent pid does not terminate the child process (opensearch) pid

2021-09-15 19:55:10 INFO     Started OpenSearch with parent PID 545914
2021-09-15 19:55:10 INFO     Waiting for service to become available
ubuntu    545425  539650  0 19:54 pts/1    00:00:00 /bin/sh -c ./opensearch-tar-install.sh
ubuntu    545426  545425 99 19:54 pts/1    00:00:28 /usr/lib/jvm/java-14-openjdk-amd64//bin/java -Xshare:auto -Dopensearch.networkaddress.cache.ttl=60 -Dopensearch.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms1g -Xmx1g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/tmp/opensearch-12322125043903400773 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Dclk.tck=100 -Djdk.attach.allowAttachSelf=true -Djava.security.policy=/tmp/tmpqvl7ws9h/local-test-cluster/opensearch-1.1.0/plugins/opensearch-performance-analyzer/pa_config/opensearch_security.policy -XX:MaxDirectMemorySize=536870912 -Dopensearch.path.home=/tmp/tmpqvl7ws9h/local-test-cluster/opensearch-1.1.0 -Dopensearch.path.conf=/tmp/tmpqvl7ws9h/local-test-cluster/opensearch-1.1.0/config -Dopensearch.distribution.type=tar -Dopensearch.bundled_jdk=true -cp /tmp/tmpqvl7ws9h/local-test-cluster/opensearch-1.1.0/lib/* org.opensearch.bootstrap.OpenSearch

@gaiksaya gaiksaya mentioned this issue Sep 17, 2021
1 task
@gaiksaya gaiksaya linked a pull request Sep 17, 2021 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta Issues specific to the OpenSearch Beta bug Something isn't working
Projects
None yet
3 participants