-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batch] collection of issues arising from Grafana alerts in January #14240
Comments
For what it's worth, this was showing up as an info log because this is a docker log message not from our code, so it's not going through our logging filters. The reason this showed up in the Google Logging query was because the query included this line
which means "logs whose severity is ERROR or whose log entry contains "WARNING"", it is not equivalent to |
As discussed in #14240, we emit warnings on database deadlocks, which there are enough of to trigger noisy alerts. Since there's nothing to be done operationally (and there's no current work underway to get rid of them), these alerts only contribute to alert fatigue and hide potential problems in the system that could be addressed. This demotes a deadlock to the `info` level so we can still see how often they occur but are not alerted by them. In the future when we resolve the current deadlock we can re-escalate this error so that we can catch new deadlocks that are introduced.
What happened?
Still a lot of deadlock errors. Largely from MJC https://cloudlogging.app.goo.gl/N8hoXPWYYWLiDPPi9
Looks like workers are leaving tasks running when they shutdown https://cloudlogging.app.goo.gl/JFYoACF9qcDvCaqk8
Looks like we need to set the severity correctly in the worker logs. I'm also seeing a lot of this
WARNING: Published ports are discarded when using host network modeMoved to #14262Also looks like we incorrectly log a ContainerTimeoutError as an error log even though that's a user error: https://cloudlogging.app.goo.gl/TUGWNxnFiBiEdsDo9
And we log ImageCannotBePulled as an error even though that's a user error: https://cloudlogging.app.goo.gl/TchqwUKNCrd6qqmh7
Also a few like this: Unknown child process pid 12331, will report returncode 255
Version
0.2.127
Relevant log output
No response
The text was updated successfully, but these errors were encountered: