-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eclipse IDE crashes with -Xverify:none #4904
Comments
Do you have a reference for this? We can work with the Eclipse team to remove this recommendation. OpenJ9 JVM makes certain assumptions on the basis that the bytecode verifier ran. This is why at a high level Someone more familiar on the VM side of things can give more details. |
@dcendents if you remove I wouldn't expect to see a crash unless there was unverifiable classfiles being loaded - in which case, all bets are off. |
@fjeremic It is not a recommendation by eclipse, but something you will find has been recommended by the community multiple times over the years. A simple google search on how to speed up eclipse will return such posts: https://stackoverflow.com/questions/316265/how-can-you-speed-up-eclipse And since this has never caused me any problem with hotspot, I've always added de-facto it to my eclipse.ini file (until now) @DanHeidinga I've never had a java.lang.VerifyError when the option is not set, either with hotspot or openj9, or anyone else I work with. |
@dcendents , was there any dump file (.dmp/.trc ,e.g) generated from the crash with |
@ChengJin01 I have uploaded the dump files on my google drive and sent you an email with the link. |
Many thanks @dcendents , already received the e-mail and will check the dumps. |
So the problem should have something to do with class loading & JIT. @dcendents , could you help to check whether it still crashes with -Xint specified ? |
Judging from the fact that the issue happen on Eclipse and the java_stack.txt showing "jit_rbx" I'll assume this is x86 related. In which case @andrewcraik FYI the above. Edit: Adding some tags to keep this on the radar. If determined failing with |
Interesting - I'll wait for the result with -Xint to see if it seems to be JIT related. The loadClass method above seems to have a try catch which could end up circular if the wrong calls were made or some kind of circularity made its way into the load strategy (was looking at https://github.com/sonatype/plexus-classworlds/blob/e869ac904400f0a966e64af7e28709ef2dd5dd3f/src/main/java/org/codehaus/plexus/classworlds/realm/ClassRealm.java#L260). Another experiment would be to use |
I've just tested again with the I've also tested with I did not try with both flags at the same time. I cannot upload everything in my google drive at the same time. Let me know if you need the dump files and I'll send them to you on Monday, one set at a time. Have a good weekend. |
Note the dump files should compress nicely, if you aren't already doing that. |
@pshipton You are right they compress nicely, I should have thought about it! I've sent a link to @ChengJin01 @andrewcraik and @pshipton Let me know if you want me to try other options or some newer builds. |
When I try to download I get |
@pshipton while there might be a bug in the JIT as well, the crash with Xint might be easier to diagnose so could you let me know what you find from the Xint run? We can then consider if there might be a similar/related issue in the JIT that we need to fix too. |
@pshipton it works for me (using the link in the email, unsigned gdrive user in firefox). There was an error saying the file is too big to be scanned but it is possible to download the file. Is is possible your corporate proxy is blocking the google drive site? |
Thanks, I managed to download it. |
It seems to be crashing in a call to memcmp(). |
The java core with -Xint shows exactly the same result as previously, which looks like there is a loop with class loading in this case:
|
I double-checked the corresponding system core dumps as follows:
Against the relationships of the requested classes:
It turns out it was the same classloader 0x00000007FAC238A0 that was loading all parent classes from JarArchiver, which seems logically correct for class loading. If so, it looks weird that it crashed when loading AbstractArchiver() as there were a bunch of classloaders (org/codehaus/plexus/classworlds/realm/ClassRealm) successfully loaded AbstractArchiver according to the java core.
|
Considering the mess in the native stack as follows:
the only calling path that might correlate
If so, then the problem should be related to the bogus cookie (which was re-allocated by GC in defineClassCommon()), which might need the GC team to get involved in the investigation. @dcendents, does the crash (with -Xint specified) occur every time or just intermittently? |
Every time. I have a lot of projects in eclipse, I normally would only open the projects I work on, but when I switched to OpenJ9 I thought i'd stress it a bit to see how it performed and opened all the projects and rebuilt them all. Every single time it crashed before completing the workspace build. I only tried once with -Xint though. If I recall correctly it took longer before it crashed, but I'm not sure it actually built more of the projects (Just took longer to get there). |
It looks like a bug out in our code. but first of all is to figure out where memcmp() in the crash came from as the dump was inaccurate for us to diagnose. I will compile a build (Java8 on Windows) with an assertion on the first argument (might be an invalid pointer or NULL) before all memcmps (without JIT) to determine the location of memcmp before it crashes. @dcendents , is that possible to share your project plus the whole eclipse workspace with us? so we can reproduce the problem locally with a compiled build (this might involve your private data which need to be shared privately e.g via Google Drive instead of in the open). |
@ChengJin01 no sorry this won't be possible. This is our customer's property and we signed an NDA. If it can help I don't mind testing your build and sharing the dump files with you, but this is as far as I can go to help you guys figure out what the problem is. Sorry I can't do more. |
@dcendents, I already sent to you a link (via Google Drive) of another compiled build with a piece of code to check whether the value of a suspicious pointer is valid or not before calling memcmp() in a function, which helps to confirm the location of crash. The build should crash on the added code rather than in memcmp(); otherwise, there might be other possibilities to trigger the crash. |
@ChengJin01 the files are on my google drive |
@dcendents , It seems there is nothing different in the dump files. So I compiled another build (already sent to you via Google Drive) by replacing the memcmp() (already commented out) with a FOR loop statement to check the whole array in that suspicious romClassLoadFromCookie(). If it still crashes in memcmp() on this build, it means the memcmp() comes from some where rather than romClassLoadFromCookie(). Another thing I want to confirm is, is that possible to increase the value of -Xmx in the eclipse.ini (e.g. over 6000m) to see what happens to the build if still crashes ? because we did notice in the verbose log that the memory was quite limited in the failing thread which triggered GC. |
I did use the new build and increased the memory to 6GB. It still crashed, the files are on my google drive ( Also bear in mind that the previous setting I had ( |
We already decoded the real native stack via Windbg as follows:
Will check what happened to the corresponding code in there. |
The core dump shows the crash occurred when the signature of method was NULL due to a non-NULL romMethod in which there is no value in there:
To figure out what happened to this romMethod, I need to compile another build with new tracepoints to double-check the corresponding romClass and the location of the failing romMethod in the romClass. |
@dcendents , I already sent to you the link of the compiled build (via Google Drive) with new tracepoints to keep track of the rom classes. Hopefully it should show us something related to the failing method in this rom class. The option in Eclipse.ini remains the same as before to trigger the crash. |
@ChengJin01 the dump files are on my drive ( |
The snap trace indicates the crash occurred when handling the method
but the trace is still incomplete as compared to the results with windbg. I need to add more tracepoints to a couple of functions around the crash. |
@dcendents, I just sent to you another compiled build with all tracepoints added. Theoretically it will show us what happened to those values before crash in memcmp. |
@ChengJin01 eclipse will not start with that build I've uploaded the dump files if that can be helpful (
|
@dcendents , the problem was caused by one of the new tracepoints which was incorrect in format (because we can't verify all of them locally). I will double-check and compile another one for you. |
@dcendents , I already fixed the problem with tracepoint and compiled another build (jdk8_openj9_build_v5.zip shared via Google Drive) for you to check. |
@ChengJin01 the dump files are on my drive ( |
According to the snaptrace generated from eclipse-2019-03-25-2.7z:
it means the right_key (passed via exemplar ) was fine against the code below:
I need to split the tracepoint in |
I'm actually surprised that we're registering classloading constraints if The hashtable could be messed up if class unloading occurred as the cleanup code only runs with the verifier enabled: We may be missing some checks to only add constraints if the verifier is enabled, |
I double-checked our code as to the check of classloading constraints (only via j9bcv_checkClassLoadingConstraintsForSignature) only when It seems such checks were already added except two places as follows (including the crash in this issue):
So the fix should be:
I will compile a build with the fix above to see whether it works to avoid the crash as expected. |
@dcendents , just sent to you the link of two builds (via Google Drive): |
@ChengJin01 I tested with both builds. It did crash with v6 (although it seemed to build for much longer than usual before it crashed). The dump files are on my drive ( As expected the fix_v1 worked, the build completed successfully. 😄 |
@dcendents, many thanks for your confirmation with the expected results. I already checked the dump files and they indicates the crash occurred when printing the class name on the left_key (from the hash table) in the added tracepoint (it was added to trigger the crash on the left_key if it was messed up), which means the hash table was corrupt at that time. As mentioned above, the hash table for class loading constraints should never be used with -Xverify:none specified. I will create a PR for the fix soon as it resolved the problem. |
The change is to avoid checking the class loading contraints when the verifier is disabled by -Xverify:none. Close: eclipse-openj9#4904 Signed-off-by: Cheng Jin <jincheng@ca.ibm.com>
The change is to avoid checking the class loading constraints when the verifier is disabled by -Xverify:none. Fix: eclipse-openj9#4904 Signed-off-by: Cheng Jin <jincheng@ca.ibm.com>
Hi, I just want to confirm I tested again using the official 0.14.0 build (OpenJDK8U-jdk_x64_windows_openj9_8u212b03_openj9-0.14.0) and as expected I did not have any problem. Thanks again |
I recently learned about OpenJ9 and I wanted to start using it. However I experienced systematic crashes when doing a full clean/build of my projects in eclipse.
I eventually narrowed it down to the
-Xverify:none
switch, which I can see is not supported by OpenJ9: https://www.eclipse.org/openj9/docs/xverify/Should OpenJ9 ignore it if this is harmful?
I'm sure a lot of people running eclipse with hotspot have that switch on, it's been documented for years as an option to set to speed up eclipse.
As a reference, this is my custom eclipse switches that I had with hotspot:
This is what I now use with OpenJ9 and seems to be stable:
The text was updated successfully, but these errors were encountered: