Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mac TestIBMJlmRemoteMemoryAuth_SE80_0 Process LT1 has ended unexpectedly #7397

Closed
pshipton opened this issue Oct 8, 2019 · 16 comments
Closed

Comments

@pshipton
Copy link
Member

pshipton commented Oct 8, 2019

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_mac_Nightly/150
TestIBMJlmRemoteMemoryAuth_SE80_0
TestIBMJlmRemoteMemoryNoAuth_SE80_0

STF 23:59:29.669 - +------ Step 5 - Wait for the processes to complete
STF 23:59:29.669 - | Wait for processes to meet expectations
STF 23:59:29.669 - |   Processes: [LT1, CL1]
STF 23:59:29.669 - |
STF 23:59:29.669 - Monitoring processes: CL1 LT1
CL1 j> 2019/09/21 23:59:29.745 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
CL1 j> 2019/09/21 23:59:34.849 Monitored VM not ready at Sep 21, 2019 11:59:34 PM (attempt 0).
CL1 j> 2019/09/21 23:59:34.850 Wait 5 secs and trying again...
CL1 j> 2019/09/21 23:59:35.988 Connection established!
CL1 j> 2019/09/21 23:59:36.610 Starting to write data
STF 00:04:29.087 - Heartbeat: Process CL1 is still running
STF 00:04:29.595 - **FAILED** Process LT1 has ended unexpectedly

1.LT1.stdout shows the following, and stderr is empty.

00:08:47.800 - Load test completed
00:08:47.801 -   Ran     : 300000
00:08:47.801 -   Passed  : 300000
00:08:47.801 -   Failed  : 0
00:08:47.801 -   Result  : PASSED

There are no useful diagnostics to explain what happened.

@pshipton
Copy link
Member Author

pshipton commented Oct 8, 2019

@pshipton pshipton changed the title TestIBMJlmRemoteMemoryAuth_SE80_0 Process LT1 has ended unexpectedly mac TestIBMJlmRemoteMemoryAuth_SE80_0 Process LT1 has ended unexpectedly Oct 8, 2019
@pshipton
Copy link
Member Author

pshipton commented Oct 8, 2019

@Mesbah-Alam

@pshipton
Copy link
Member Author

@pshipton
Copy link
Member Author

Closed #7169 as a dup of this.

@JasonFengJ9
Copy link
Member

JasonFengJ9 commented Nov 3, 2019

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_windows_Nightly/184/tapResults/

STF 02:00:00.246 - **FAILED** Process CL1 has timed out
STF 02:00:00.246 - Collecting dumps for: CL1
STF 02:00:01.603 - Using procdump.exe to generate .DMP files

STF 02:01:33.795 - Monitoring Report Summary:
STF 02:01:33.795 -   o Process CL1 has timed out
STF 02:01:33.795 -   o Process LT1 is still running as expected
STF 02:01:33.796 - Killing processes: CL1 LT1
STF 02:01:34.670 - Using powershell to kill CL1

STF 02:01:39.483 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 02:01:39.483 - 

STF 02:01:39.655 - =====================   R E S U L T S   =====================
STF 02:01:39.655 - Stage results:
STF 02:01:39.655 -   setUp:     pass
STF 02:01:39.655 -   execute:  *fail*
STF 02:01:39.655 -   teardown:  pass
STF 02:01:39.655 - 
STF 02:01:39.655 - Overall result: **FAILED**

TestIBMJlmRemoteMemoryNoAuth_SE80_0_FAILED

@pshipton update:
This one is different, the stdout shows the LT1 process just stopped before completion (likely killed), there is nothing in the stderr. There are 2.CL1.jvmdump.stdout and kill_powershell.stderr files that indicate a process doesn't exist.
No process matching the specified PID can be found.
Stop-Process : Cannot find a process with the process identifier 260.

I'm inclined to ignore this failure as some kind of machine problem, i.e. maybe something else was running and the test didn't get enough CPU and timed out. There is no indication of a crash, and no JVM diagnostic files to look at.

02:01:02.309 - Completed 14.0%. Number of tests started=42132 (+5628)
02:01:22.279 - Completed 16.2%. Number of tests started=48532 (+6400)

@pshipton
Copy link
Member Author

pshipton commented Nov 7, 2019

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_mac_Nightly/194

@Mesbah-Alam did you ever get a chance to look at this problem?

@JasonFengJ9
Copy link
Member

https://ci.eclipse.org/openj9/job/Test_openjdk13_j9_sanity.system_x86-64_mac_Nightly/143/tapResults/

STF 04:20:56.211 - **FAILED** Process CL1 has timed out
STF 04:20:56.211 - Collecting dumps for: CL1
STF 04:20:56.211 - Sending SIG 3 to the java process to generate a javacore
STF 04:20:56.212 - Running command: kill -3 6680

Similar with #7397 (comment)

@pshipton
Copy link
Member Author

@JasonFengJ9 the previous failure is #7276 (comment) you need to look at the artifacts.

@JasonFengJ9
Copy link
Member

@lumpfish
Copy link
Contributor

This is fixed by adoptium/openj9-systemtest#118

@JasonFengJ9
Copy link
Member

Another occurrence at an internal build Test_openjdk11_j9_sanity.system_s390x_linux_xl/5

STF 22:28:47.826 - +------ Step 5 - Wait for the processes to complete
STF 22:28:47.826 - | Wait for processes to meet expectations
STF 22:28:47.826 - |   Processes: [LT1, CL1]
STF 22:28:47.826 - |
STF 22:28:47.826 - Monitoring processes: CL1 LT1
CL1 j> 2021/01/20 22:28:48.015 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
CL1 j> 2021/01/20 22:28:48.099 Attempting to connect
CL1 j> 2021/01/20 22:28:48.423 Monitored VM not ready at Jan 20, 2021, 10:28:48 PM (attempt 1, elapsed 301ms).
CL1 j> 2021/01/20 22:28:48.423 Waiting 5 secs and trying again...
CL1 j> 2021/01/20 22:28:53.423 Attempting to connect
CL1 j> 2021/01/20 22:28:56.762 Connection established!
CL1 j> 2021/01/20 22:28:57.492 Starting to write data
STF 22:31:27.322 - **FAILED** Process LT1 has ended unexpectedly
STF 22:31:27.326 - Monitoring Report Summary:
STF 22:31:27.330 -   o Process CL1 is still running
STF 22:31:27.330 -   o Process LT1 has ended unexpectedly
STF 22:31:27.399 - Killing processes: CL1 LT1
STF 22:31:27.401 -   o Process CL1 pid 5444 stop()
STF 22:31:28.409 -   o Process CL1 pid 5444 killed
CL1 ..............j> 2021/01/20 22:31:27.468 Exiting as JVM we are connected to has finished
CL1 stderr Exception in thread "main" java.lang.AssertionError
CL1 stderr 	at org.junit.Assert.fail(Assert.java:86)
CL1 stderr 	at org.junit.Assert.fail(Assert.java:95)
CL1 stderr 	at net.openj9.test.jlm.remote.MemoryProfiler.getStatsViaProxy(MemoryProfiler.java:217)
CL1 stderr 	at net.openj9.test.jlm.remote.MemoryProfiler.main(MemoryProfiler.java:119)
STF 22:31:28.411 -   o Process LT1 pid 5443 is not running
**FAILED** at step 5 (Wait for the processes to complete). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_s390x_linux_xl/openjdk-tests/TKG/../TKG/test_output_16112051315889/TestIBMJlmRemoteMemoryAuth_0/20210120-222845-TestIBMJlmRemoteMemoryAuth/execute.pl line 178.
STF 22:31:28.953 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 22:31:28.969 - 
STF 22:31:28.978 - ====================   T E A R D O W N   ====================
STF 22:31:28.980 - Running teardown: perl /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_s390x_linux_xl/openjdk-tests/TKG/../TKG/test_output_16112051315889/TestIBMJlmRemoteMemoryAuth_0/20210120-222845-TestIBMJlmRemoteMemoryAuth/tearDown.pl
STF 22:31:29.259 - TEARDOWN stage completed
STF 22:31:29.269 - 
STF 22:31:29.269 - =====================   R E S U L T S   =====================
STF 22:31:29.271 - Stage results:
STF 22:31:29.277 -   setUp:     pass
STF 22:31:29.277 -   execute:  *fail*
STF 22:31:29.278 -   teardown:  pass
STF 22:31:29.278 - 
STF 22:31:29.279 - Overall result: **FAILED**

TestIBMJlmRemoteMemoryAuth_0_FAILED

@pshipton
Copy link
Member Author

3.LT1.stdout shows the following, no explanation why it stopped. No JVM diagnostic files.

22:30:29.098 - Completed 5.6%. Number of tests started=184628 (+39053)
22:30:49.102 - Completed 6.7%. Number of tests started=217262 (+32634)
22:31:09.057 - Completed 7.8%. Number of tests started=252425 (+35163)

@lumpfish
Copy link
Contributor

Nothing in LT1.stderr either?

@pshipton
Copy link
Member Author

Nothing in LT1.stderr either?

It's empty.

system_test_output (1).tar.gz

@lumpfish
Copy link
Contributor

lumpfish commented Feb 3, 2021

Raised adoptium/aqa-systemtest#402 to get more diagnostics

@pshipton
Copy link
Member Author

Not seen for some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants