Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to submit <n> logs #4346

Closed
sebbov opened this issue Nov 4, 2017 · 8 comments
Closed

Failed to submit <n> logs #4346

sebbov opened this issue Nov 4, 2017 · 8 comments
Assignees
Labels
api: logging Issues related to the Cloud Logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@sebbov
Copy link

sebbov commented Nov 4, 2017

We enabled Stackdriver logging for our application this morning and are seeing excessive exceptions since:

Failed to submit 1 logs., due to:

[...]
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>
During handling of the above exception, another exception occurred:
[...]
google.gax.errors.RetryError: RetryError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>)

I don't think we're hitting the API limit (1k rps, and we're averaging at 60rps). I do see some 403 errors reported in the API console, but not sure those relate to these deadline exceeded.

We're seeing this at an about hourly frequency across our 10 instances. This application itself runs in GCP. This is using the CloudLoggingHandler with async / batching transport.

  1. Specify the API at the beginning of the title (for example, "BigQuery: ...")
    General, Core, and Other are also allowed as types

Stackdriver logging

  1. OS type and version

Debian 9

  1. Python version and virtual environment information python --version

Python 3.5.3

  1. google-cloud-python version pip show google-cloud, pip show google-<service> or pip freeze
# pip freeze | grep google
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
google-api-core==0.1.1
google-auth==1.2.0
google-auth-httplib2==0.0.2
google-cloud==0.29.0
google-cloud-bigquery==0.28.0
google-cloud-bigtable==0.28.1
google-cloud-core==0.28.0
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==0.31.0
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.0
google-cloud-pubsub==0.29.0
google-cloud-resource-manager==0.28.0
google-cloud-runtimeconfig==0.28.0
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.16.0
google-cloud-translate==1.3.0
google-cloud-videointelligence==0.28.0
google-cloud-vision==0.28.0
google-gax==0.15.15
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
grpc-google-iam-v1==0.11.4
proto-google-cloud-datastore-v1==0.90.4
proto-google-cloud-error-reporting-v1beta1==0.15.3
proto-google-cloud-logging-v2==0.91.3
  1. Stacktrace if available

2017-11-04 14:44:56,211 E Failed to submit 1 logs.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/google/gax/retry.py", line 121, in inner
    return to_call(*args)
  File "/usr/local/lib/python3.5/site-packages/google/gax/retry.py", line 68, in inner
    return a_func(*updated_args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 492, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/lib/python3.5/site-packages/grpc/_channel.py", line 440, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/google/cloud/logging/handlers/transports/background_thread.py", line 98, in _safely_commit_batch
    batch.commit()
  File "/usr/local/lib/python3.5/site-packages/google/cloud/logging/logger.py", line 549, in commit
    client.logging_api.write_entries(entries, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/google/cloud/logging/_gax.py", line 131, in write_entries
    partial_success=partial_success, options=options)
  File "/usr/local/lib/python3.5/site-packages/google/cloud/gapic/logging/v2/logging_service_v2_client.py", line 353, in write_log_entries
    return self._write_log_entries(request, options)
  File "/usr/local/lib/python3.5/site-packages/google/gax/api_callable.py", line 452, in inner
    return api_caller(api_call, this_settings, request)
  File "/usr/local/lib/python3.5/site-packages/google/gax/api_callable.py", line 438, in base_caller
    return api_call(*args)
  File "/usr/local/lib/python3.5/site-packages/google/gax/api_callable.py", line 376, in inner
    return a_func(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/google/gax/retry.py", line 127, in inner
    ' classified as transient', exception)
google.gax.errors.RetryError: RetryError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded)>)
  1. Steps to reproduce
  2. Code example
        client = cloud_logging.Client()
        resource = cloud_resource.Resource(...)
        handler = cloud_handlers.CloudLoggingHandler(client,
                                                     name=...,
                                                     resource=resource)
        logger = logging.getLogger()
        logger.addHandler(handler)
        logger = logging.getLogger('google.cloud.logging.handlers')
        logger.setLevel(logging.INFO)
@tseaver tseaver added the api: bigquery Issues related to the BigQuery API. label Nov 6, 2017
@tseaver tseaver changed the title Stackdriver logging: Failed to submit <n> logs ( Failed to submit <n> logs ( Nov 6, 2017
@tseaver tseaver added api: logging Issues related to the Cloud Logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed api: bigquery Issues related to the BigQuery API. labels Nov 6, 2017
@sebbov
Copy link
Author

sebbov commented Nov 18, 2017

What is the ETA for a fix? Our logs are flooding with exceptions of this logging client.

@ilrico
Copy link

ilrico commented Jan 5, 2018

same problem here, same setup (Debian Stretch), python 3.5.3. any fix arose ?

@dhermes dhermes changed the title Failed to submit <n> logs ( Failed to submit <n> logs Jan 5, 2018
@tseaver
Copy link
Contributor

tseaver commented Jan 5, 2018

@waprin, do you have any insight here?

@waprin
Copy link
Contributor

waprin commented Jan 5, 2018

Historically, the most common reason bugs occur here is either:

  1. There is some sort of recursive loop where a logging call gets made on the logging code path.

This is what the setup_logging helper helps avoid, if you don't want to use it, check out what it does and maybe do something similar.

  1. The background thread handler is the most vulnerable to bugs just because there's concurrency, so it might just be a bug. @jonparrott significantly changed the concurrency code in Overhaul logging background thread transport #3407 so he might have some insight here too.

If you don't mind making sync logging calls ,switching to using the synchronous transport will will fix 2) (and also be a useful data point if it does. since then the issue is very likely concurrency bug)

handler = CloudLoggingHandler(transport=google.cloud.logging.handlers.transports.SyncTransport)

So, you can help verify whether it's 1) by disabling log propagation (see setup_logging) or 2) by switching to SyncTransport, and then once we know which it is, it will be easier to fix.

@ilrico
Copy link

ilrico commented Jan 6, 2018

I would go for 2. because:

  • it worked previously for last 2 days (since I installed this feature)
  • it's only on high throwing logs that I started to get this exception

hope it helps

@theacodes
Copy link
Contributor

Deadline exceeded likely means that the request for some reason or another failed to complete in the allotted time. This doesn't necessarily indicate an issue with the library, rather, it can indicate that either the network was congested and unable to send the requests within the deadline or that the logging service failed to respond within the deadline. Switching transports is unlikely to solve this, as it'll just "move" where the deadline exceeded error is bubbled.

I'm not sure what to do here. One thing we should definitely do is regenerate the underlying code to pick up the new retry/timeout behavior in api_core, which will allow us the possibility of giving users the ability to change the deadline.

@chemelnucfin chemelnucfin added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jan 9, 2018
@theacodes
Copy link
Contributor

logging's underlying code has been regenerated and I suspect this issue may disappear. I'm going to close this issue for now, but please comment if it reappears with the latest version.

@Kharms
Copy link

Kharms commented Feb 8, 2019

I'm periodically getting this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: logging Issues related to the Cloud Logging API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

7 participants