-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error encoding and sending metric family: write tcp <ip> -> <ip>: write: broken pipe #30700
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hello @tcaty, can you share more information about the environment you're running Kubernetes in? This error is a networking issue coming from Prometheus closing the connection, and could be related to your underlying OS and kernel version. References: Kernel bug fix, another related issue, and another related issue. |
Hi @crobert-1! Thank you for your reply! |
Hello again @crobert-1) We have increased cpu limits for our otel-collector and now it works good, there were no errors with sending metrics for last 5 days. Actually the problem is that cpu throttling is connection killer. There is one of related articles here. |
Glad to hear it's working again, thanks for including a solution too, it's really helpful as a future reference! |
Component(s)
exporter/prometheus
What happened?
Description
Hi! We faced issue with exporting metrics recently. Prometheus exporter stops exporting metrics after ~ 8 hours and gets the error specified in title:
error encoding and sending metric family: write tcp <ip> -> <ip>: write: broken pipe
Now it works only with cronjob which restart otel-collector every 8 hours. But it seems like it should not work like this, sometimes collector gets this error earlier than 8 hours. It is very unstable.
We think that this error could be occured by low memory, but it seems like it's not, cause there is the
memory_limiter
inprocessors
and there are no any logs about soft or hard memory limits in otel-collector pods stdout.Steps to Reproduce
Expected Result
Otel-collector works stable and export prometheus metrics through all time without restarts.
Actual Result
Otel-collector can work stable without restart only ~ 8 hours.
Collector version
v0.88.0
Environment information
Environment
kubernetes v1.24
opentelemtry-collector helm chart v0.73.1
prometheus v2.47.1
OpenTelemetry Collector configuration
Log output
Additional context
Otel-collector metrics
OpenTelemetry Collector dashboard
Kubernetes / Views / Pods
Otel-collector resources configuration
Prometheus chart main configuration
The text was updated successfully, but these errors were encountered: