Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failures: jarHell failed to run on Windows #47953

Closed
nknize opened this issue Oct 11, 2019 · 9 comments · Fixed by #48240
Closed

CI Failures: jarHell failed to run on Windows #47953

nknize opened this issue Oct 11, 2019 · 9 comments · Fixed by #48240
Assignees
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI v7.4.1 v8.0.0-alpha1

Comments

@nknize
Copy link
Contributor

nknize commented Oct 11, 2019

This jarHell failure has cropped up a few times in the last couple days on master and 7.4 branches.

https://groups.google.com/a/elastic.co/forum/#!searchin/build-elasticsearch/jarHell%7Csort:date

All with the same failure:

14:41:02 > Task :x-pack:plugin:ml:jarHell FAILED
14:41:02 Exec output and error:
14:41:02 | Output for C:\Users\jenkins\.java\openjdk12\bin\java.exe:
14:41:02 
14:41:02 > Task :x-pack:plugin:security:testingConventions
14:41:02 
14:41:02 FAILURE: Build failed with an exception.
14:41:02 
14:41:02 * What went wrong:
14:41:02 Execution failed for task ':x-pack:plugin:ml:jarHell'.
14:41:02 > A problem occurred starting process 'command 'C:\Users\jenkins\.java\openjdk12\bin\java.exe
@nknize nknize added >test-failure Triaged test failures from CI :ml Machine learning v8.0.0 v7.4.1 labels Oct 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195
Copy link
Contributor

droberts195 commented Oct 11, 2019

A problem occurred starting process 'command 'C:\Users\jenkins\.java\openjdk12\bin\java.exe is the cause here. The fact it was trying to do a check on some ML source at the time is a fluke.

The worker had 32 CPUs. Does that mean it was doing 32 checks in parallel, each in a separate JVM? Certainly there are lots of very similar timestamps on the check tasks that are listed in the console log.

Maybe on Windows the parallelism needs scaling back a little or the workers need more memory on Windows?

@droberts195 droberts195 added :Delivery/Build Build or test infrastructure and removed :ml Machine learning labels Oct 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Build)

@droberts195 droberts195 changed the title CI Failures: jarHell in ML CI Failures: jarHell failed to run on Windows Oct 11, 2019
@alpar-t
Copy link
Contributor

alpar-t commented Oct 15, 2019

Here's a scan: https://gradle-enterprise.elastic.co/s/4dnt4bscjqldo
The root cause:

Caused by: java.io.IOException: CreateProcess error=206, The filename or extension is too long

@droberts195
Copy link
Contributor

The associated stack trace might also be useful because it shows which component doesn't like long path names - it's net.rubygrapefruit.platform.internal.DefaultProcessLauncher:

at net.rubygrapefruit.platform.internal.DefaultProcessLauncher.start(DefaultProcessLauncher.java:25)
at net.rubygrapefruit.platform.internal.WindowsProcessLauncher.start(WindowsProcessLauncher.java:22)
at net.rubygrapefruit.platform.internal.WrapperProcessLauncher.start(WrapperProcessLauncher.java:36)
at org.gradle.process.internal.ExecHandleRunner.startProcess(ExecHandleRunner.java:98)
at org.gradle.process.internal.ExecHandleRunner.run(ExecHandleRunner.java:71)
at org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.run(CurrentBuildOperationPreservingRunnable.java:42)
at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)

@droberts195
Copy link
Contributor

A similar problem was worked around in the Elasticsearch code by getting the short (8.3) path before passing it to the ProcessBuilder - see #25344.

@alpar-t
Copy link
Contributor

alpar-t commented Oct 15, 2019

I think the error is a bit misleading, as the path is rather short here C:\Users\jenkins\.java\openjdk12\bin\java.exe.
I think it's the command line length that gets exceeded because the classpath is too long.
We could pass it as as the CLASSPATH env var instead, but looking at some issues Gradle should already be doing it ...

@mark-vieira
Copy link
Contributor

I think it's the command line length that gets exceeded because the classpath is too long.

I believe you are correct:

Cannot run program "C:\Users\jenkins\.java\openjdk12\bin\java.exe" (in directory "C:\Users\jenkins\workspace\elastic+elasticsearch+master+multijob-windows-compatibility\os\windows-2012-r2\x-pack\plugin\ml"): CreateProcess error=206, The filename or extension is too long

We could pass it as as the CLASSPATH env var instead, but looking at some issues Gradle should already be doing it ...

I don't think using CLASSPATH is a long-term solution:

All environment variables must live together in a single environment block, which itself has a limit of 32767 characters.

That said, this might be good enough for now, as I believe the current command line limit is 8k characters so perhaps that will solve the problem temporarily until Gradle 6.0 which has a permanent fix for this.

@mark-vieira
Copy link
Contributor

mark-vieira commented Oct 15, 2019

Gradle should already be doing it ...

For clarification, Gradle does not already do this. The only scenario in which Gradle circumvents OS command line length limits is for worker processes (like test workers, or the worker API using process isolation) where the classpath is passed via stdin. Things like project.javaexec() just use -cp just as you would in the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Build Build or test infrastructure Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI v7.4.1 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants