Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to rebuild monorepo with GRPC cache with 4.1.0 #13512

Closed
shirchen opened this issue May 25, 2021 · 1 comment
Closed

Failing to rebuild monorepo with GRPC cache with 4.1.0 #13512

shirchen opened this issue May 25, 2021 · 1 comment

Comments

@shirchen
Copy link

Description of the problem / feature request:

After trying to upgrade our monorepo to 4.1.0, we observed the following stack in BEP file:

^R^C<A2>^A^@^Z<AE>^X^R<AB>^X[129,702 / 129,702] checking cached actions
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.AssertionError
        at com.google.devtools.build.lib.remote.ReferenceCountedChannel$1.deallocate(ReferenceCountedChannel.java:48)
        at io.netty.util.AbstractReferenceCounted.handleRelease(AbstractReferenceCounted.java:86)
        at io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76)
        at com.google.devtools.build.lib.remote.ReferenceCountedChannel.release(ReferenceCountedChannel.java:144)
        at com.google.devtools.build.lib.remote.GrpcCacheClient.close(GrpcCacheClient.java:166)
        at com.google.devtools.build.lib.remote.disk.DiskAndRemoteCacheClient.close(DiskAndRemoteCacheClient.java:64)
        at com.google.devtools.build.lib.remote.RemoteCache.close(RemoteCache.java:1021)
        at com.google.devtools.build.lib.remote.RemoteActionContextProvider.executionPhaseEnding(RemoteActionContextProvider.java:154)
        at com.google.devtools.build.lib.buildtool.ExecutionTool.executeBuild(ExecutionTool.java:376)
        at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:207)
        at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:408)
        at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:97)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:579)
        at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:231)
        at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:543)
        at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:606)
        at io.grpc.Context$1.run(Context.java:579)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException
        at com.google.devtools.build.lib.remote.grpc.ChannelConnectionFactory$ChannelConnection.close(ChannelConnectionFactory.java:56)
        at com.google.devtools.build.lib.remote.grpc.SharedConnectionFactory.close(SharedConnectionFactory.java:75)
        at com.google.devtools.build.lib.remote.grpc.DynamicConnectionPool.close(DynamicConnectionPool.java:57)
        at com.google.devtools.build.lib.remote.ReferenceCountedChannel$1.deallocate(ReferenceCountedChannel.java:46)
        ... 19 more
Caused by: java.lang.InterruptedException
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(Unknown Source)
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(Unknown Source)
        at java.base/java.util.concurrent.CountDownLatch.await(Unknown Source)
        at io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:861)
        at io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57)
        at com.google.devtools.build.lib.remote.grpc.ChannelConnectionFactory$ChannelConnection.close(ChannelConnectionFactory.java:54)
        ... 22 more
^H
^C<A2>^A^@<BA>^A^@

In build logs we see:

19:36:59 [129,702 / 129,702] checking cached actions
20:51:53 Build timed out (after 240 minutes). Marking the build as aborted.

or

17:08:34 [54,513 / 54,823] 752 / 837 tests, 41 failed; checking cached actions
19:07:58 INFO: Elapsed time: 7552.322s, Critical Path: 149.58s
19:07:58 INFO: 34909 processes: 5133 remote cache hit, 4424 internal, 25352 linux-sandbox.

What operating system are you running Bazel on?

Linux

What's the output of bazel info release?

We are currently on 3.7.0, but have not observed this error with 4.0.0.

@werkt
Copy link
Contributor

werkt commented May 27, 2021

This is throwing during cache shutdown via a registered executionPhaseEnding for the RemoteActionContextProvider. The cache is being closed, the channel is being released, and due to the behavior change in a6293b3, which dropped the channel.shutdown() for a pool closure, which includes an awaitTermination, suddenly thread interrupted status matters. ManagedChannel::shutdown indicates no behavior change for an interrupted thread, so it seems right to either suppress it or ignore it as a status.

I'll put up a change to flip and restore the status as needed. I'll leave in the consideration of an InterruptedException during the await.

werkt added a commit to werkt/bazel that referenced this issue May 27, 2021
awaitTermination will throw InterruptedException if the interrupted
status is set initially when it is called, even if no wait is required.
Pool closure should not respect active interrupted status when shutting
down and awaiting termination as a result of its call from
executionPhaseEnding, which will occur during abnormal exits from
ExecutionTool. Ignore this status initially and restore the flag upon
exit of the factory close. An external interrupt which occurs during the
awaitTermination will still trigger an InterruptedException, as
expected.

Fixes bazelbuild#13512
katre pushed a commit that referenced this issue Jul 12, 2021
awaitTermination will throw InterruptedException if the interrupted
status is set initially when it is called, even if no wait is required.
Pool closure should not respect active interrupted status when shutting
down and awaiting termination as a result of its call from
executionPhaseEnding, which will occur during abnormal exits from
ExecutionTool. Ignore this status initially and restore the flag upon
exit of the factory close. An external interrupt which occurs during the
awaitTermination will still trigger an InterruptedException, as
expected.

Fixes #13512

Closes #13521.

PiperOrigin-RevId: 377006347
katre pushed a commit to katre/bazel that referenced this issue Jul 13, 2021
awaitTermination will throw InterruptedException if the interrupted
status is set initially when it is called, even if no wait is required.
Pool closure should not respect active interrupted status when shutting
down and awaiting termination as a result of its call from
executionPhaseEnding, which will occur during abnormal exits from
ExecutionTool. Ignore this status initially and restore the flag upon
exit of the factory close. An external interrupt which occurs during the
awaitTermination will still trigger an InterruptedException, as
expected.

Fixes bazelbuild#13512

Closes bazelbuild#13521.

PiperOrigin-RevId: 377006347
katre pushed a commit to katre/bazel that referenced this issue Jul 13, 2021
awaitTermination will throw InterruptedException if the interrupted
status is set initially when it is called, even if no wait is required.
Pool closure should not respect active interrupted status when shutting
down and awaiting termination as a result of its call from
executionPhaseEnding, which will occur during abnormal exits from
ExecutionTool. Ignore this status initially and restore the flag upon
exit of the factory close. An external interrupt which occurs during the
awaitTermination will still trigger an InterruptedException, as
expected.

Fixes bazelbuild#13512

Closes bazelbuild#13521.

PiperOrigin-RevId: 377006347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants