Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test-sanity.functional-JDK8-linux_390-64_cmprssptrs Floating point error at Compiled_method=sun/reflect/generics/parser/SignatureParser.current()C #4462

Closed
JasonFengJ9 opened this issue Jan 25, 2019 · 12 comments

Comments

@JasonFengJ9
Copy link
Member

JasonFengJ9 commented Jan 25, 2019

OMR acceptance build
https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK8-linux_390-64_cmprssptrs/653/tapResults/

===============================================
Running test UnsafeTests_SE80_1 ...
===============================================
UnsafeTests_SE80_1 Start Time: Fri Jan 25 14:36:51 2019 Epoch Time (ms): 1548445011767
variation: -DScenario=Compiled
JVM_OPTIONS:  -Xcompressedrefs -DScenario=Compiled 
Unhandled exception
Type=Floating point error vmState=0x00000000
J9Generic_Signal_Number=00000020 Signal_Number=00000008 Error_Value=00000000 Signal_Code=00000000
Handler1=000003FFB5FC89A0 Handler2=000003FFB5BA7928
gpr0=0000000000000012 gpr1=000000002C4ED890 gpr2=000000000000000E gpr3=0000000000000024
gpr4=000003FF9F6096FA gpr5=00000000000D26E8 gpr6=000000002C4ED880 gpr7=000003FFB5FEC958
gpr8=000003FFB69FD690 gpr9=000000000018FF58 gpr10=000003FFB00AA150 gpr11=0000000000014F00
gpr12=000003FFB5E84D16 gpr13=0000000000014F00 gpr14=000003FFB516A930 gpr15=000003FFB69FD4E8
psw=000003FF9F609748 mask=0705000180000000 fpc=0008ff00 bea=000003FFB600C0C8
fpr0 000003ffb5e84d16 (f: 3051900160.000000, d: 2.172310e-311)
fpr1 402f29f85474a547 (f: 1416930688.000000, d: 1.558197e+01)
fpr2 000003ffb01bf000 (f: 2954620928.000000, d: 2.172261e-311)
fpr3 3fd54ba2b1ae625e (f: 2980995584.000000, d: 3.327414e-01)
fpr4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr5 3d7546bc34d744c4 (f: 886523072.000000, d: 1.209418e-12)
fpr6 402f29f85474a547 (f: 1416930688.000000, d: 1.558197e+01)
fpr7 bc4f800000000000 (f: 0.000000, d: -3.415237e-18)
fpr8 00000000000d2d88 (f: 863624.000000, d: 4.266869e-318)
fpr9 00000000000d2d88 (f: 863624.000000, d: 4.266869e-318)
fpr10 000003ffc927ee3d (f: 3374837248.000000, d: 2.172469e-311)
fpr11 0000000161fa9038 (f: 1643810816.000000, d: 2.934146e-314)
fpr12 0000000161fa9160 (f: 1643811200.000000, d: 2.934146e-314)
fpr13 000003ffcf77cc28 (f: 3480734720.000000, d: 2.172521e-311)
fpr14 00000000000d2d80 (f: 863616.000000, d: 4.266830e-318)
fpr15 000003fff637edb8 (f: 4130860544.000000, d: 2.172843e-311)

Compiled_method=sun/reflect/generics/parser/SignatureParser.current()C
Target=2_90_20190125_1153 (Linux 4.4.0-128-generic)
CPU=s390x (4 logical CPUs) (0x1e9a1c000 RAM)
----------- Stack Backtrace -----------
(0x000003FF9F609748 [<unknown>+0x0])
---------------------------------------

===============================================
Running test stringConcatOptTest_0 ...
===============================================
stringConcatOptTest_0 Start Time: Fri Jan 25 14:36:57 2019 Epoch Time (ms): 1548445017085
variation: NoOptions
JVM_OPTIONS:  -Xcompressedrefs 
Unhandled exception
Type=Floating point error vmState=0x00000000
J9Generic_Signal_Number=00000020 Signal_Number=00000008 Error_Value=00000000 Signal_Code=00000000
Handler1=000003FFB89C89A0 Handler2=000003FFB85A7928
gpr0=0000000000000012 gpr1=000000002C6C8310 gpr2=000000000000000E gpr3=0000000000000024
gpr4=000003FFA20C4432 gpr5=00000000000CA578 gpr6=000000002C6C8300 gpr7=000000000C82EF68
gpr8=000000002C6C82E0 gpr9=000000002C6C82E0 gpr10=000003FFB40AA560 gpr11=0000000000014F00
gpr12=000000000C82AB38 gpr13=0000000000014F00 gpr14=000003FFB3AEA930 gpr15=000003FFB93FD4E8
psw=000003FFA20C4480 mask=0705000180000000 fpc=0008ff00 bea=000003FFB8A0C0C8
fpr0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr1 402f29f689990f9e (f: 2308509696.000000, d: 1.558196e+01)
fpr2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr3 3fd54b62b48e414a (f: 3029221632.000000, d: 3.327262e-01)
fpr4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr5 3d7549c61738dda3 (f: 389602720.000000, d: 1.210093e-12)
fpr6 402f29f689990f9e (f: 2308509696.000000, d: 1.558196e+01)
fpr7 3c57c00000000000 (f: 0.000000, d: 5.149960e-18)
fpr8 00000000000cad28 (f: 830760.000000, d: 4.104500e-318)
fpr9 00000000000cad28 (f: 830760.000000, d: 4.104500e-318)
fpr10 000003ffd5bfee33 (f: 3586125312.000000, d: 2.172573e-311)
fpr11 000000015f236038 (f: 1596153856.000000, d: 2.910601e-314)
fpr12 000000015f236160 (f: 1596154240.000000, d: 2.910601e-314)
fpr13 000003fffe37d0a8 (f: 4265070848.000000, d: 2.172909e-311)
fpr14 00000000000cad20 (f: 830752.000000, d: 4.104460e-318)
fpr15 000003fff637edb8 (f: 4130860544.000000, d: 2.172843e-311)

Compiled_method=sun/reflect/generics/parser/SignatureParser.current()C
Target=2_90_20190125_1153 (Linux 4.4.0-128-generic)
CPU=s390x (4 logical CPUs) (0x1e9a1c000 RAM)
----------- Stack Backtrace -----------
(0x000003FFA20C4480 [<unknown>+0x0])
---------------------------------------

===============================================
Running test jit_jitt_0 ...
===============================================
jit_jitt_0 Start Time: Fri Jan 25 14:37:16 2019 Epoch Time (ms): 1548445036847
variation: -Xjit:noJitUntilMain,count=0,assumeStrictFP,optlevel=warm,gcOnResolve,rtResolve -verbose:stackwalk=0 -Xdump
JVM_OPTIONS:  -Xcompressedrefs -Xjit:noJitUntilMain,count=0,assumeStrictFP,optlevel=warm,gcOnResolve,rtResolve -verbose:stackwalk=0 -Xdump 
Unhandled exception
Type=Floating point error vmState=0x00000000
J9Generic_Signal_Number=00000020 Signal_Number=00000008 Error_Value=00000000 Signal_Code=00000000
Handler1=000003FF821489A0 Handler2=000003FF81D27928
gpr0=0000000000000012 gpr1=000000002C700270 gpr2=00000000000CFF30 gpr3=0000000000000024
gpr4=000003FF6B813C3C gpr5=00000000000CFF28 gpr6=000000002C700260 gpr7=000000002C700260
gpr8=000000002C700338 gpr9=000000002C700240 gpr10=000000000000DE02 gpr11=000003FF818B52A8
gpr12=000003FF82B7DE10 gpr13=0000000000014F00 gpr14=000003FF6B8C0558 gpr15=000003FF82B7D958
psw=000003FF6B813C9E mask=0705200180000000 fpc=0008ff00 bea=000003FF6B813C86
fpr0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr1 402f29f689990f9e (f: 2308509696.000000, d: 1.558196e+01)
fpr2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr3 3fd54b62b48e414a (f: 3029221632.000000, d: 3.327262e-01)
fpr4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr5 3d7549c61738dda3 (f: 389602720.000000, d: 1.210093e-12)
fpr6 402f29f689990f9e (f: 2308509696.000000, d: 1.558196e+01)
fpr7 3c57c00000000000 (f: 0.000000, d: 5.149960e-18)
fpr8 00000000000d0858 (f: 854104.000000, d: 4.219834e-318)
fpr9 00000000000d0858 (f: 854104.000000, d: 4.219834e-318)
fpr10 000003ffe8f7ed63 (f: 3908562176.000000, d: 2.172733e-311)
fpr11 000000016655f048 (f: 1716908160.000000, d: 2.970261e-314)
fpr12 000000016655f180 (f: 1716908416.000000, d: 2.970261e-314)
fpr13 000003ffe6afcbe8 (f: 3870280704.000000, d: 2.172714e-311)
fpr14 00000000000d0850 (f: 854096.000000, d: 4.219795e-318)
fpr15 000003fff637edb8 (f: 4130860544.000000, d: 2.172843e-311)

Compiled_method=sun/reflect/generics/parser/SignatureParser.current()C
Target=2_90_20190125_1153 (Linux 4.4.0-128-generic)
CPU=s390x (4 logical CPUs) (0x1e9a1c000 RAM)
----------- Stack Backtrace -----------
(0x000003FF6B813C9E [<unknown>+0x0])

A few more in the job output, could JIT team investigate?

@pshipton
Copy link
Member

@fjeremic

@pshipton
Copy link
Member

pshipton commented Jan 28, 2019

Note the blocker status indicates this issue is blocking OMR acceptance.

@fjeremic
Copy link
Contributor

Taking a look at [1] we can see the SHA difference between the first failing (build 1120) builds and the last passing builds (build 1119) is [2]. The diff shows only a single change in OMR, and taking a look at the corresponding PR eclipse-omr/omr#3509 we can clearly see this is an S390 only change which modified the behavior of OMR functions.

Taking a look at the latest failure in [3] and the errors from the description we can see they are all Floating point errors. Peeking inside of the core file from [3] and looking at the disassembly from the second failure:

===============================================
Running test StringPeepholeTest_0 ...
===============================================
StringPeepholeTest_0 Start Time: Fri Jan 25 14:42:16 2019 Epoch Time (ms): 1548445336982
variation: -XX:-EnableHCR -Xjit:count=0,optLevel=hot
JVM_OPTIONS:  -Xcompressedrefs -XX:-EnableHCR -Xjit:count=0,optLevel=hot 
Unhandled exception
Type=Floating point error vmState=0x00000000
J9Generic_Signal_Number=00000020 Signal_Number=00000008 Error_Value=00000000 Signal_Code=00000000
Handler1=000003FF884489A0 Handler2=000003FF83FA7928
gpr0=0000000000000012 gpr1=000000002C6A9088 gpr2=0000000000000024 gpr3=000003FF71CED50C
gpr4=000003FF71CEE8FC gpr5=0000000000097D10 gpr6=0000000000000000 gpr7=000000002C6A9078
gpr8=000000002C6A9078 gpr9=0004390000000000 gpr10=000000002C6A9300 gpr11=000000002C6A9320
gpr12=000003FF88E7DE10 gpr13=0000000000014F00 gpr14=000003FF71CED50C gpr15=000003FF88E7D958
psw=000003FF71CEE9A2 mask=0705000180000000 fpc=0008ff00 bea=000003FF8848C0C8


0x3ff71cee974 {sun/.../SignatureParser.parseSuperInterfaces} +120          A7690000     lghi      r6,0 <<< ^+5060
0x3ff71cee978 {sun/.../SignatureParser.parseSuperInterfaces} +124          E3A050B80024 stg       r10,184(,r5)
0x3ff71cee97e {sun/.../SignatureParser.parseSuperInterfaces} +130          1799         xr        r9,r9
0x3ff71cee980 {sun/.../SignatureParser.parseSuperInterfaces} +132          E54C507C0000 mvhi      124(r5),0x0
0x3ff71cee986 {sun/.../SignatureParser.parseSuperInterfaces} +138          B9040087     lgr       r8,r7
0x3ff71cee98a {sun/.../SignatureParser.parseSuperInterfaces} +142   1:34   E3107008009D llgfat    r1,8(,r7) <<< ^+1948
0x3ff71cee990 {sun/.../SignatureParser.parseSuperInterfaces} +148          E300700C0014 lgf       r0,12(,r7)
0x3ff71cee996 {sun/.../SignatureParser.parseSuperInterfaces} +154          EC201FBE0159 risbgn    r2,r0,0x1F,0xBE,0x1
0x3ff71cee99c {sun/.../SignatureParser.parseSuperInterfaces} +160   1:34   EB0A10040023 clt       r0,B'1010',4(r1)
0x3ff71cee9a2 {sun/.../SignatureParser.parseSuperInterfaces} +166          E30210080095 llh       r0,8(r2,r1)

It looks like we failed on the trap instruction with signal 8 which is a fixed-point exception on the CLT. Taking a look at the data it's comparing:

(kca) (2C6A9088h+4)/X
%1 = 0x000000002c6a908c: 00000012

It matches the register GPR0 and the mask has bit 0 set which tests for equality meaning that the trap instruction should have trapped. It is likely then something goes wrong in the signal handling then. Either way it is almost certainly related to the FPC change in OMR.

Subscribing @keithc-ca and @sehirst for further investigation. Perhaps we ought to revert the change?

[1] https://ci.eclipse.org/openj9/job/Pipeline-OMR-Acceptance/
[2] https://github.com/eclipse/omr/compare/be896fe..b8a14eb
[3] https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK8-linux_390-64_cmprssptrs/653/tapResults/

@keithc-ca
Copy link
Contributor

@fjeremic Please explain what you believe is wrong with the change in eclipse-omr/omr#3509 to support your suggestion to revert it.

@sehirst
Copy link
Contributor

sehirst commented Jan 28, 2019

@keithc-ca It definitely needs to be reverted. I should have been more careful in my review. I thought we were only changing the RAS output. Changing anything to do with the VM functionality of the FPC register is highly risky as these issues show.

Is it possible to just change the RAS output, i.e. the output of the FPC register when the JVM crashes?

@keithc-ca
Copy link
Contributor

Where does the JIT use the OMR API that was modified? I'd like to understand the data flow.

@fjeremic
Copy link
Contributor

Where does the JIT use the OMR API that was modified? I'd like to understand the data flow.

Yeah trying to understand that as well.

@keithc-ca
Copy link
Contributor

I think the problem is in openj9/runtime/compiler/runtime/SignalHandler.c which expects FPC to be J9PORT_SIG_VALUE_ADDRESS:

  infoType = j9sig_info(sigInfo, J9PORT_SIG_CONTROL, J9PORT_SIG_CONTROL_S390_FPC, &infoName, &infoValue);
  if (infoType != J9PORT_SIG_VALUE_ADDRESS)
     return J9PORT_SIG_EXCEPTION_CONTINUE_SEARCH;

I'm working on a fix.

@fjeremic
Copy link
Contributor

Where does the JIT use the OMR API that was modified? I'd like to understand the data flow.

Got it. So here is what happens. OpenJ9 calls the OMR API omrsig_info via a table:

https://github.com/eclipse/openj9/blob/b0028d77d5073571cf2e843c392ead5169fb6bfa/runtime/oti/j9port_generated.h#L790

Then the JIT calls it here in the S390 signal handler:

https://github.com/eclipse/openj9/blob/b0028d77d5073571cf2e843c392ead5169fb6bfa/runtime/compiler/runtime/SignalHandler.c#L937-L939

Which used to check explicitly for the SIG_VALUE_ADDRESS and it handles traps. Because it was now changed to SIG_VALUE_32 we simply return and do not handle trap instructions on Linux on Z, so the signal propagates up and it can't be handled so we crash with a SIGFPE.

@fjeremic
Copy link
Contributor

On another note I'm not sure why we use the j9 prefixes for these OMR library calls. Surely if the API called everywhere was omrsig_info not j9sig_info one would have identified that there is a call to it from the JIT signal handler and the things could be updated appropriately.

@keithc-ca
Copy link
Contributor

keithc-ca commented Jan 28, 2019

Perhaps this will fix it? #4477

@keithc-ca
Copy link
Contributor

Closing: fixed via #4477 and #4500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants