Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WindowsFileSystem#determineUnixRoot has a hardcoded timeout of 3 seconds, which is problematic for our unit tests #2983

Closed
haxorz opened this issue May 10, 2017 · 16 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) platform: windows type: bug

Comments

@haxorz
Copy link
Contributor

haxorz commented May 10, 2017

// Wait 3 seconds max, that should be enough to run this command.
process.waitFor(3, TimeUnit.SECONDS);

note that this is a timeout for termination of a subprocess.

we're getting flaky test failures with stack traces like

java.lang.ExceptionInInitializerError
at com.google.devtools.build.lib.runtime.BlazeRuntime.fileSystemImplementation(BlazeRuntime.java:825)
at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:971)
at com.google.devtools.build.lib.runtime.BlazeRuntime.batchMain(BlazeRuntime.java:753)
at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:561)
at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:58)
Caused by: java.lang.IllegalThreadStateException: process has not exited
at java.lang.ProcessImpl.exitValue(ProcessImpl.java:443)
at com.google.devtools.build.lib.windows.WindowsFileSystem.determineUnixRoot(WindowsFileSystem.java:509)
at com.google.devtools.build.lib.windows.WindowsFileSystem.(WindowsFileSystem.java:324)
... 5 more

take a look at

/**
* Timeouts for asserting that an arbitrary event occurs eventually.
*
* <p>In general, it's not appropriate to use a small constant timeout for an arbitrary
* computation since there is no guarantee that a snippet of code will execute within a given
* amount of time - you are at the mercy of the jvm, your machine, and your OS. In theory we
* could try to take all of these factors into account but instead we took the simpler and
* obviously correct approach of not having timeouts.
*
* <p>If a test that uses these timeout values is failing due to a "timeout" at the
* 'blaze test' level, it could be because of a legitimate deadlock that would have been caught
* if the timeout values below were small. So you can rule out such a deadlock by changing these
* values to small numbers (also note that the --test_timeout blaze flag may be useful).
*/
public static final long WAIT_TIMEOUT_MILLISECONDS = Long.MAX_VALUE;
public static final long WAIT_TIMEOUT_SECONDS = WAIT_TIMEOUT_MILLISECONDS / 1000;

@laszlocsomor
Copy link
Contributor

Thanks! Solution is to define the msys root as a JVM variable, e.g.:

"-Dbazel.windows_unix_root=C:/fake/msys",

@haxorz
Copy link
Contributor Author

haxorz commented May 10, 2017

the test that failed for me was https://github.com/bazelbuild/bazel/blob/28c9617d53bf58dcba9572bcfdc165c62421d983/src/test/shell/bazel/bazel_windows_example_test.sh

if you're sure that the lack of bazel.windows_unix_root jvm system property is the problem, then that means my diagnosis was wrong (i just jumped at the sight of a hardcoded short timeout). but then my next question would be "how does this test ever currently pass???"

@laszlocsomor
Copy link
Contributor

perhaps

always finished within 3 seconds until now?

@laszlocsomor
Copy link
Contributor

i should remove that logic [1] anyway (as well as the System.getenv call in line 501), and use the BAZEL_SH value from the clientenv

[1]

try {
process = Runtime.getRuntime().exec("cmd.exe /C " + bash + " -c \"/usr/bin/cygpath -m /\"");
// Wait 3 seconds max, that should be enough to run this command.
process.waitFor(3, TimeUnit.SECONDS);
if (process.exitValue() == 0) {
path = readAll(process.getInputStream());
} else {
System.err.print(
String.format(
"ERROR: %s (exit code: %d)%n",
readAll(process.getErrorStream()), process.exitValue()));
}
} catch (InterruptedException | IOException e) {
// Silently ignore failure. Either MSYS is installed at a different location, or not
// installed at all, or some error occurred. We can't do anything anymore but throw an
// exception if someone tries to create a Path from an absolute Unix path.
return null;
}

bazel-io pushed a commit that referenced this issue May 11, 2017
Workaround for #2983

Change-Id: Ib7eb2acff9a3bd1eee304d2d329d2a525a91eee1
PiperOrigin-RevId: 155722499
@dslomov dslomov added platform: windows type: bug P3 We're not considering working on this, but happy to review a PR. (No assignee) labels May 16, 2017
@adfernandes
Copy link

adfernandes commented Jun 8, 2017

For me, on Windows 7 (I know, I know...), the adding bazel.windows_unix_root as per this patch did nothing. And yes, I was rebuilding from 0.5.1.

The good news is that changing the timeout to 21 seconds in WindowsFileSystem.java did fix the problem.

I have a relatively speedy laptop at work for development, but... there is so much Windows-anti-malware-stupidity running on the system that spawning out a process takes waaaaaay longer than three seconds. Usually it can work in under 10. Usually.

Sigh.

@adfernandes
Copy link

Added PR #3160

@laszlocsomor
Copy link
Contributor

@adfernandes :

the adding bazel.windows_unix_root as per this patch did nothing

That's the bug, not the timeout.

@laszlocsomor
Copy link
Contributor

I managed to repro the bug: Bazel doesn't pick up the JVM flag from the bazelrc.

I modified Bazel to throw an IllegalStateException if the bazel.windows_unix_root JVM argument is not defined, instead of trying to run cygpath.

Without the JVM flag:

c:\work\bazel>"C:/tmp1/o6hbs7n0/execroot/bazel/bazel-out/msvc_x64-fastbuild/bin/src/bazel.exe" --nomaster_bazelrc --bazelrc=NUL --batch build //examples/cpp:hello-world --cpu=x64_windows_msys --host_cpu=x64_windows_msys
java.lang.ExceptionInInitializerError
        at com.google.devtools.build.lib.runtime.BlazeRuntime.fileSystemImplementation(BlazeRuntime.java:816)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:962)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.batchMain(BlazeRuntime.java:744)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:552)
        at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:58)
Caused by: java.lang.IllegalStateException: haha
        at com.google.devtools.build.lib.windows.WindowsFileSystem.determineUnixRoot(WindowsFileSystem.java:499)
        at com.google.devtools.build.lib.windows.WindowsFileSystem.<clinit>(WindowsFileSystem.java:324)
        ... 5 more
<unknown> crash in async thread:
java.lang.ExceptionInInitializerError
        at com.google.devtools.build.lib.runtime.BlazeRuntime.fileSystemImplementation(BlazeRuntime.java:816)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:962)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.batchMain(BlazeRuntime.java:744)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:552)
        at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:58)
Caused by: java.lang.IllegalStateException: haha
        at com.google.devtools.build.lib.windows.WindowsFileSystem.determineUnixRoot(WindowsFileSystem.java:499)
        at com.google.devtools.build.lib.windows.WindowsFileSystem.<clinit>(WindowsFileSystem.java:324)
        ... 5 more

JVM flag passed on the command line:

c:\work\bazel>"C:/tmp1/o6hbs7n0/execroot/bazel/bazel-out/msvc_x64-fastbuild/bin/src/bazel.exe" --nomaster_bazelrc --bazelrc=NUL --host_jvm_args=-Dbazel.windows_unix_root=c:\tools\msys64 --batch build //examples/cpp:hello-world --cpu=x64_windows_msys --host_cpu=x64_windows_msys
INFO: Found 1 target...
Target //examples/cpp:hello-world up-to-date:
  C:/tempdir/temp/_bazel_laszlocsomor/o6hbs7n0/execroot/io_bazel/bazel-out/msys_x64-fastbuild/bin/examples/cpp/hello-world.exe
INFO: Elapsed time: 2.376s, Critical Path: 0.66s

JVM flag in the bazelrc:

c:\work\bazel>echo "startup --host_jvm_args=-Dbazel.windows_unix_root=c:\tools\msys64" > my_bazelrc

c:\work\bazel>"C:/tmp1/o6hbs7n0/execroot/bazel/bazel-out/msvc_x64-fastbuild/bin/src/bazel.exe" --nomaster_bazelrc --bazelrc=my_bazelrc --batch build //examples/cpp:hello-world --cpu=x64_windows_msys --host_cpu=x64_windows_msys
java.lang.ExceptionInInitializerError
        at com.google.devtools.build.lib.runtime.BlazeRuntime.fileSystemImplementation(BlazeRuntime.java:816)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:962)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.batchMain(BlazeRuntime.java:744)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:552)
        at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:58)
Caused by: java.lang.IllegalStateException: haha
        at com.google.devtools.build.lib.windows.WindowsFileSystem.determineUnixRoot(WindowsFileSystem.java:499)
        at com.google.devtools.build.lib.windows.WindowsFileSystem.<clinit>(WindowsFileSystem.java:324)
        ... 5 more
<unknown> crash in async thread:
java.lang.ExceptionInInitializerError
        at com.google.devtools.build.lib.runtime.BlazeRuntime.fileSystemImplementation(BlazeRuntime.java:816)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:962)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.batchMain(BlazeRuntime.java:744)
        at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:552)
        at com.google.devtools.build.lib.bazel.BazelMain.main(BazelMain.java:58)
Caused by: java.lang.IllegalStateException: haha
        at com.google.devtools.build.lib.windows.WindowsFileSystem.determineUnixRoot(WindowsFileSystem.java:499)
        at com.google.devtools.build.lib.windows.WindowsFileSystem.<clinit>(WindowsFileSystem.java:324)
        ... 5 more

@laszlocsomor
Copy link
Contributor

@adfernandes : please see #3170 (comment)

@adfernandes
Copy link

Hmm.

Actually, my instance of this bug is a little different.

I was building bazel from source, and nowhere in the docs (under either using nor installing) does it say that on Windows the bazelrc file is required.

Thanks for changing the exception to "IllegalState", though. That's far less misleading than before.

But it still doesn't help someone trying to use or build bazel for the first time.

On my system, I tried both the prebuilt binary and building from source.

Reading the comments in this bug report, it wasn't (and still isn't) clear that a bazelrc is needed even for bootstrap-building bazel.

In this case, the code fell back to cygdrive which should have worked but didn't because my slow crapware-riddled corporate laptop cannot have the JVM launch an external process in less than three seconds.

So increasing the timeout did fix the bug... kinda'.

@laszlocsomor
Copy link
Contributor

I was building bazel from source, and nowhere in the docs (under either using nor installing) does it say that on Windows the bazelrc file is required.

It isn't, only the BAZEL_SH envvar is required, but if you observe slow process spawning then yes, updating the bootstrap scripts and passing this --host_jvm_args is the workaround for you.

Thanks for changing the exception to "IllegalState", though.

Uhm, I should've clarified that this was a local change :) Not yet merged. But I'm working on merging it!

Reading the comments in this bug report, it wasn't (and still isn't) clear that a bazelrc is needed even for bootstrap-building bazel.

It's not. What makes you think it is?

@adfernandes
Copy link

Hmm. Probably the difference between building from 0.5.1-dist vs git HEAD. :-)

On 0.5.1-dist my bazelrc is ignored during bootstrap build (which is probably as designed), so cygpath is called and it times out. So that's definitely a bug and fixed by increasing the timeout.

On pre-built bazel 0.5.1, having a bazelrc that sets

startup --host_jvm_args=-Dbazel.windows_unix_root=C:\\msys2

is fine.

Note that I have always had BAZEL_SH set correctly, and I do not have Msys2 installed at c:\tools.

So until the next -dist is rolled out I can't test the changes on HEAD :-(

@adfernandes
Copy link

(Actually, a little side-note - it turns out that on my system, even 13 seconds is not enough to prevent a timeout... I had to set it to 21 seconds! And I've been building bazel a lot from source today! Boy, I hope this problem is fixed upstream!) :)

@laszlocsomor
Copy link
Contributor

You could pass the --host_jvm_args flag even while bootstrapping like so:

$ export BAZEL_BOOTSTRAP_STARTUP_OPTIONS="-Dbazel.windows_unix_root=C:/msys2"
$ ./compile.sh

Ouch, that 21 second timeout sounds atrocious, I'm surprised you had the patience to try anything at all.

@laszlocsomor
Copy link
Contributor

And if that doesn't work for whatever reason, then please let me know, and also you could hardcode the path in WindowsFileSystem.UNIX_ROOT as:

private static final PathFragment UNIX_ROOT = PathFragment.create("c:/msys2");

@adfernandes
Copy link

@laszlocsomor: Much appreciated, thanks!

I will try to get to testing it, but, unfortunately I've put a lot of time into this already. I've got enough battles to try and fight ("Why isn't maven good enough? I mean, it integrates so well with Spring!" 🤢 ) that I can only put so much effort into bazel. 😞

But I will try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) platform: windows type: bug
Projects
None yet
Development

No branches or pull requests

4 participants