Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoggingAppender failure #3478

Closed
stetra opened this issue Jul 17, 2018 · 5 comments
Closed

LoggingAppender failure #3478

stetra opened this issue Jul 17, 2018 · 5 comments
Assignees
Labels
api: logging Issues related to the Cloud Logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@stetra
Copy link

stetra commented Jul 17, 2018

I am using com.google.cloud.logging.logback.LoggingAppender to send logs to Stackdriver. I have configured the logger in logback.xml. Earlier today, Stackdriver had an outage, and the throughput of my service stopped. Turns out the logger was unable to write logs due to the outage:

Jul 17, 2018 9:06:01 AM com.google.common.util.concurrent.AbstractFuture executeListener
SEVERE: RuntimeException while executing runnable CallbackListener{com.google.api.core.ApiFutures$1@6781a500} with executor MoreExecutors.directExecutor()
java.lang.RuntimeException: com.google.cloud.logging.LoggingException: io.grpc.StatusRuntimeException: UNAVAILABLE: 502:Bad Gateway
at com.google.cloud.logging.LoggingImpl$7.onFailure(LoggingImpl.java:578)

So threads became blocked on logging:

"Gax-8" #33 daemon prio=5 os_prio=0 tid=0x00005588b0db1800 nid=0x1d72 in Object.wait() [0x00007f7d2cee0000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at com.google.api.gax.batching.BlockingSemaphore.acquire(BlockingSemaphore.java:61)
    - locked <0x00000000ee827400> (a com.google.api.gax.batching.BlockingSemaphore)
    at com.google.api.gax.batching.FlowController.reserve(FlowController.java:197)
    at com.google.api.gax.batching.BatchingFlowController.reserve(BatchingFlowController.java:58)
    at com.google.api.gax.batching.ThresholdBatcher.add(ThresholdBatcher.java:166)
    at com.google.api.gax.rpc.BatchingCallable.futureCall(BatchingCallable.java:73)
    at com.google.api.gax.rpc.UnaryCallable$1.futureCall(UnaryCallable.java:126)
    at com.google.api.gax.rpc.UnaryCallable.futureCall(UnaryCallable.java:87)
    at com.google.cloud.logging.spi.v2.GrpcLoggingRpc.write(GrpcLoggingRpc.java:223)
    at com.google.cloud.logging.LoggingImpl.writeAsync(LoggingImpl.java:593)
    at com.google.cloud.logging.LoggingImpl.writeLogEntries(LoggingImpl.java:559)
    at com.google.cloud.logging.LoggingImpl.write(LoggingImpl.java:522)
    at com.google.cloud.logging.logback.LoggingAppender.append(LoggingAppender.java:201)
    at com.google.cloud.logging.logback.LoggingAppender.append(LoggingAppender.java:63)
    at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
    at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
    at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
    at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
    at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
    at ch.qos.logback.classic.Logger.filterAndLog_1(Logger.java:398)
    at ch.qos.logback.classic.Logger.debug(Logger.java:486)

However, it's been many hours since the outage was resolved, yet my service has still not recovered. I would expect the logger to finish writing the pending log events, and service throughput would resume. Any ideas on why this is happening? I have attached the thread stack traces: threads.txt

@yihanzhen yihanzhen added type: question Request for information or clarification. Not an issue. api: logging Issues related to the Cloud Logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jul 17, 2018
@saturnism
Copy link

this feels more than a p2 since service is interrupted?

@yihanzhen yihanzhen added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 24, 2018
@JustinBeckwith JustinBeckwith added triage me I really want to be triaged. 🚨 This issue needs some love. labels Aug 24, 2018
@JustinBeckwith JustinBeckwith added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. 🚨 This issue needs some love. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Aug 24, 2018
@pongad
Copy link
Contributor

pongad commented Aug 24, 2018

This should be fixed by googleapis/gax-java#567. We discovered that flow control does not get released properly if RPC returns an error. This would explain why the program hangs when Stackdriver has an outage and doesn't recover: we leak enough flow control that everything always blocks.

@JustinBeckwith
Copy link
Contributor

@pongad now that googleapis/gax-java#567 is resolved, does that mean this is fixed?

@pongad
Copy link
Contributor

pongad commented Sep 7, 2018

Since this bug is on the ... scary side, I think we might as well leave this open until we update gax

@yihanzhen
Copy link
Contributor

gax version is updated in this PR. It should be OK to close this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the Cloud Logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

5 participants