-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changing the way core files are stored in SmartOS VMs so that we can actually use them #492
Comments
How about configuring core dump directory to /home/iojs/dump or smth together with a cron job then? (find -mtime -delete) |
I'm not very well versed on |
That's exactly what I'm suggesting, with the nit that I would choose a different name for this directory:
Running We should also make sure that |
Cool. I'm actually in the process of setting up smartos15 and 16 hosts. I'll look at incorporating this in the playbook. |
Sorry for the delay here -- i'm a bit swamped at the moment. If anyone wants to chip in with improving playbooks for smartos14..16 that would be appreciated. I can spin up test machines if required. Also, protip from @chorrell -- we can disable the SmartLogin (solving shared key access boundaries) by doing something in style with:
|
So, I spent some time on smartos today. Looks like we're running into issues with your openjdk8: # /opt/local/java/openjdk8/bin/java -Xmx128m -jar slave.jar -jnlpUrl https://ci.nodejs.org/computer/test-joyent-smartos15-x64-1/slave-agent.jnlp -secret foo
Exception in thread "main" java.lang.Error: Error during hash calculation
at sun.security.ssl.HandshakeHash.getFinishedHash(HandshakeHash.java:249)
at sun.security.ssl.HandshakeMessage$Finished.getFinished(HandshakeMessage.java:1952)
at sun.security.ssl.HandshakeMessage$Finished.<init>(HandshakeMessage.java:1899)
at sun.security.ssl.ClientHandshaker.sendChangeCipherAndFinish(ClientHandshaker.java:1214)
at sun.security.ssl.ClientHandshaker.serverHelloDone(ClientHandshaker.java:1134)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:348)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:914)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153)
at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:269)
at hudson.remoting.Launcher.run(Launcher.java:219)
at hudson.remoting.Launcher.main(Launcher.java:192)
Caused by: java.lang.RuntimeException: Could not clone digest
at sun.security.ssl.HandshakeHash.cloneDigest(HandshakeHash.java:194)
at sun.security.ssl.HandshakeHash.getFinishedHash(HandshakeHash.java:247)
... 17 more
Caused by: java.lang.CloneNotSupportedException: SHA-384
at sun.security.pkcs11.P11Digest.clone(P11Digest.java:316)
at java.security.MessageDigest$Delegate.clone(MessageDigest.java:560)
at sun.security.ssl.HandshakeHash.cloneDigest(HandshakeHash.java:191)
... 18 more
Caused by: sun.security.pkcs11.wrapper.PKCS11Exception: CKR_STATE_UNSAVEABLE
at sun.security.pkcs11.wrapper.PKCS11.C_GetOperationState(Native Method)
at sun.security.pkcs11.P11Digest.clone(P11Digest.java:311)
... 20 more I don't have more time to look into that, but an up to date playbook is available in my repo. |
..same happens for smartos16. |
That's odd. I use a base-64-lts 15.4.0 with jenkins for Joyent image builds and openjdk8 works fine on that node. Is there more than one version of Java install, like a |
This might be relevant: https://www.illumos.org/issues/7227 |
So maybe:
|
@chorrell: |
@chorrell sorry, it does work -- I just messed up ordering. |
First run with 15,16 here: https://ci.nodejs.org/job/node-test-commit-smartos/4584/ |
This has been implemented on all hosts and are available in the playbooks in my refactor. |
@jbergstroem Thank you very much for your work! Should this issue be closed? |
I guess we could, but seeing how we still need to land my PR it would be slightly misleading? |
What PR are you referring to? |
I assume this one: #606 |
@gibfahn Thanks for the context! Let's not close this issue until that PR is merged then. |
#606 was merged a while ago, so closing. Thank you very much @jbergstroem! |
This is related to nodejs/node#7649, where a test running on SmartOS made a node process abort, and thus made the system generate a core file.
That core file could have helped us root-causing the problem, but unfortunately it wasn't available anymore because by default core files are stored in the global zone of the server on which the VM runs, and they're deleted after one week.
Another problem with the default setup for core file storage with SmartOS test VMs is that only Triton cloud's operators can access core files stored in the server's global zone.
What we could do is set the configuration of every SmartOS test VM so that:
This way, when a test failure happens due to a node process aborting on a SmartOS test VM, the person who ran the CI tests job can ask a member of the build WG to get the core file and e.g upload it to manta so that it can be inspected with mdb_v8.
Regardless of whether the file needs to be inspected, it would be deleted after a week, and not fill up space on test VMs.
Does that sound like a useful thing to do? If so I can help set it up, just let me know.
The text was updated successfully, but these errors were encountered: