Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test-sanity.functional-JDK11-win_x86-64_cmprssptrs StringPeepholeTest_0 NPE #2955

Closed
pshipton opened this issue Sep 20, 2018 · 12 comments
Closed

Comments

@pshipton
Copy link
Member

https://ci.eclipse.org/openj9/job/Test-sanity.functional-JDK11-win_x86-64_cmprssptrs/19/tapResults/

This isn't the first occurrence of the failure.

===============================================
Running test StringPeepholeTest_0 ...
===============================================
StringPeepholeTest_0 Start Time: Thu Sep 20 01:05:16 2018 Epoch Time (ms): 1537423516714
test with -XX:-EnableHCR -Xjit:count=0,optLevel=hot
Exception in thread "main" java/lang/NullPointerException
	at java/lang/ThreadLocal.set (java.base@9/ThreadLocal.java:222)
	at java/lang/StringCoding.set (java.base@9/StringCoding.java:83)
	at java/lang/StringCoding.encode (java.base@9/StringCoding.java:500)
	at java/lang/System.afterClinitInitialization (java.base@9/System.java:157)
	at java/lang/Thread.initialize (java.base@9/Thread.java:371)
	at java/lang/Thread.<init> (java.base@9/Thread.java:153)

StringPeepholeTest_0_FAILED
@pshipton
Copy link
Member Author

@JasonFengJ9 for triage please.

@pshipton
Copy link
Member Author

Given the test options, its likely jit
@irinarada

@JasonFengJ9
Copy link
Member

It is JIT.

        if (map != null) {
            map.set(this, value);  --> NPE here
        } else {
            createMap(t, value);
        }

@irinarada
Copy link

@cathyzhyi - Can we look at this with urgency, it's holding the 0.10.0 delivery. Would you please assess the mustfix nature... Thanks!

@andrewcraik
Copy link
Contributor

actually I think @liqunl is the best person to have a quick look here

@liqunl
Copy link
Contributor

liqunl commented Sep 20, 2018

I got a crash in compiled method java/lang/StringCoding.encode(B[B)[B running the test on my laptop. Looking at the crash.

@irinarada
Copy link

Thanks @andrewcraik and @liqunl

@liqunl
Copy link
Contributor

liqunl commented Sep 20, 2018

The crash is not reproducible on java9 and jdk10, it only happens on jdk11.

The crash is an Illegal instruction

Unhandled exception
Type=Illegal instruction vmState=0x00000000
Windows_ExceptionCode=c000001d J9Generic_Signal=00000010 ExceptionAddress=000007
FFEB9219E2 ContextFlags=0010005f
Handler1=000007FEDE655D70 Handler2=000007FEE0C1ECA0
RDI=000000001268A540 RSI=00000007FFDF9EC8 RAX=000000000000000E RBX=00000007FFDFE
060
RCX=000007FFEB9219F8 RDX=0000000002A49A00 R8=00000007FFDFDF58 R9=00000007FFDFDF5
8
R10=0000000002AFB900 R11=0000000002A49A00 R12=0000000002AFA600 R13=0000000002936
FF0
R14=00000007FFDFDE48 R15=000007FEDD1B020C
RIP=000007FFEB9219E2 RSP=0000000002A49718 RBP=0000000002A49A00 GS=002B
FS=0053 ES=002B DS=002B
XMM0 0000000002a49710 (f: 44341008.000000, d: 2.190737e-316)
XMM1 00000000021feab0 (f: 35646128.000000, d: 1.761153e-316)
XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=
Module_base_address=000007FFEB920000 Offset_in_DLL=00000000000019e2

Compiled_method=java/lang/StringCoding.encode(B[B)[B
Target=2_90_20180919_23 (Windows 7 6.1 build 7601 Sevice Pack 1)
CPU=amd64 (8 logical CPUs) (0x3e893a000 RAM)
----------- Stack Backtrace -----------
(0x000007FFEB9219E2)
(0x00000007FFDF9EC8)
(0x00000007FFDFDE98)
(0x0000000002A4974A)
J9VMDllMain+0x13d1c3 (0x000007FEDD2CE1D3 [j9gc29+0x13e1d3])
(0x00000007FFDF9E28)
---------------------------------------

*** Invalid JIT return address 00000007FFDF9E28 in 00000000021F9490

In trace log

0x000007FFEB9219D8 000019a4                                                 Snippet Label L1635:                ; VPic Data
0x000007FFEB9219D8 000019a4                      e8 e3 e0 1f 00 00 00 00    call        populateVPicVTableDispatch              ; 0x00000000121A9A30    patched with vtable call
0x000007FFEB9219E0 000019ac                      e9 5b f2 ff ff             jmp Label L1633

In the core dump

0x000007FFEB9219D8:  **cc cc** e8 00 00 00 00 00 e9 5b f2 ff ff 48 89 44 24 10 48 89 74 24 08 48 bf

The first 4 bytes have been patched to cc cc, which is odd. But we couldn't have come from 0x000007FFEB9219D8 otherwise we would have crashed there. Apparently, we came straight to 0x000007FFEB9219E2 which is not a valid address of an instruction.

The snippet is generated for

 n2191n   (  0)  ResolveCHK [#30] (sharedMemory )                                                     [0x000007FFEAA89C90] bci=[17,17,222] rc=0 vc=6100 vn=- li=137 udi=- nc=1 flg=0x8
 n2190n   (  0)    calli  java/lang/ThreadLocal$ThreadLocalMap.set(Ljava/lang/ThreadLocal;Ljava/lang/Object;)V[#831  unresolved virtual Method] [flags 0x400 0x0 ] ()  [0x000007FFEAA89C40] bci=[17,17,222] rc=0 vc=6100 vn=- li=137 udi=- nc=4 flg=0x28
 n2189n   (  0)      aloadi  <vft-symbol>[#535  Shadow] [flags 0x18607 0x0 ] (in GPR_3825) (X!=0 sharedMemory )  [0x000007FFEAA89BF0] bci=[17,17,222] rc=0 vc=6100 vn=- li=137 udi=40544 nc=1 flg=0x4
 n6558n   (  0)        ==>aRegLoad (in &GPR_3808) (X!=0 X>=0 SeenRealReference sharedMemory )
 n6558n   (  0)      ==>aRegLoad (in &GPR_3808) (X!=0 X>=0 SeenRealReference sharedMemory )
 n6560n   (  0)      ==>aRegLoad (in &GPR_3810) (X!=0 SeenRealReference sharedMemory )
 n6559n   (  0)      ==>aRegLoad (in &GPR_3809) (X!=0 SeenRealReference sharedMemory )

I suspect the call to populateVPicVTableDispatch changed the return address to 0x000007FFEB9219E2.
Maybe that's related to what @jdmpapin has done for nestmates recently?

@jdmpapin
Copy link
Contributor

Yes, this is caused by an unresolved private call. My code is supposed to set the return address so that we return to the jump shown at 0x7ffeb9219e0. But I thought that there would be 7 bytes of space for the call, not 8, so PicBuilder "should" have set the return address to 0x7ffeb9219df instead. From Liqun's comment, the bytes there are 00 e9 5b f2 ff, which disassemble like this:

0x7ffeb9219df:   00 e9                   add    cl,ch
0x7ffeb9219e1:   5b                      pop    rbx
0x7ffeb9219e2:   f2 ff                   repnz (bad)

Returning there will result in an illegal instruction at 0x7ffeb9219e2, which is exactly what Liqun found.

It turns out that the length of the call instruction we would use for normal virtual dispatch is either 7 or 8, depending on the value of its ModR/M byte. PicBuilder will need to inspect the ModR/M byte in order to determine the correct return address, and also when patching over the call. On 32-bit, PicBuilder already knows the correct length of 6, which thankfully is fixed.

@irinarada
Copy link

@liqunl , @jdmpapin @andrewcraik - do we have an ETA for this?

@jdmpapin
Copy link
Contributor

jdmpapin commented Sep 24, 2018

I have what I believe should be a fix, and I'm trying to reproduce the 8-byte call so that I can verify the behaviour in that case. With a bit of luck I hope to be done that today

jdmpapin added a commit to jdmpapin/openj9 that referenced this issue Sep 24, 2018
When x86 PicBuilder is carrying out a direct call to a private method of
a nestmate, it sets the return address so that the call will return into
the code in the VPIC snippet, which in the case of a virtual dispatch
would be something like the following:

    call qword ptr [$vftReg+$vftOffset]
    jmp mainline

In particular, the callee should return to the jmp instruction, whose
address has been calculated as 7 + the address of the call instruction.
But the call is sometimes one byte longer. The variability is now dealt
with as follows:

- When overwriting the call, PicBuilder now writes a value that works
  for both 7- and 8-byte offsets to the jmp instruction.

- When doing direct dispatch for nestmates, PicBuilder now detects the
  offset to jmp, in order to compute the correct return address.

Fixes eclipse-openj9#2955
Signed-off-by: Devin Papineau <devinmp@ca.ibm.com>
@jdmpapin
Copy link
Contributor

I've opened #3005 with the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants