Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostmetrics -> network scraper fails the conntrack collection on distros where the conntrack is not loaded #12799

Closed
dloucasfx opened this issue Aug 1, 2022 · 1 comment · Fixed by #12886
Assignees
Labels
bug Something isn't working receiver/hostmetrics

Comments

@dloucasfx
Copy link
Contributor

Describe the bug
The hostmetrics receiver throws the below "benign" error when the network scraper tries to collect conntrack metrics from Linux distros that are missing the conntrack module. Example: Amazon Linux 2

error
scraperhelper/scrapercontroller.go:197 Error scraping metrics
{"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics",
"error": "failed to read conntrack info: open
/proc/sys/net/netfilter/nf_conntrack_count: no such file or directory",
"scraper": "network"}
Jul 27 17:55:20xxxxxx.internal otelcol[18746]:
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
Jul 27 17:55:20 xxxxxxx.internal otelcol[18746]:
/builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/
go.opentelemetry.io/collector@v0.56.0/receiver/scraperhelper/scrapercontroller.go:197
Jul 27 17:55:20 xxxxxx.internal otelcol[18746]:
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
Jul 27 17:55:20 xxxxxx.internal otelcol[18746]:
/builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/
go.opentelemetry.io/collector@v0.56.0/receiver/scraperhelper/scrapercontroller.go:172

Although this error is "partial/benign" and the remaining hostmetrics will still be collected/emitted, I don't think we should log errors like this on every poll and pollute the logs with error messages.

The conntrack metrics is a recently added feature https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/11769/files

I was able to repro and the only way to get rid of the error is by installing a feature that loads the conntrack module

sudo lsmod | grep nf_
sudo systemctl status firewalld
sudo systemctl enable firewalld
sudo systemctl start firewalld
sudo systemctl status firewalld
sudo lsmod | grep nf_

Steps to reproduce
Create EC2 with Amazon Linux 2 Kernel 5.10 AMI 2.0.20220606.1 x86_64 HVM gp2
AMI ID: ami-02d1e544b84bf7502

Run OTEL with hostmetrics receiver and network scraper, example:

  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      # System load average metrics https://en.wikipedia.org/wiki/Load_(computing)
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Aggregated system process count metrics
      processes:
      # System processes metrics, disabled by default
      # process:

What did you expect to see?
No errors in the logs

What did you see instead?
The following error repeating on every poll

error
scraperhelper/scrapercontroller.go:197 Error scraping metrics
{"kind": "receiver", "name": "hostmetrics", "pipeline": "metrics",
"error": "failed to read conntrack info: open
/proc/sys/net/netfilter/nf_conntrack_count: no such file or directory",
"scraper": "network"}
Jul 27 17:55:20xxxxxx.internal otelcol[18746]:
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
Jul 27 17:55:20 xxxxxxx.internal otelcol[18746]:
/builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/
go.opentelemetry.io/collector@v0.56.0/receiver/scraperhelper/scrapercontroller.go:197
Jul 27 17:55:20 xxxxxx.internal otelcol[18746]:
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
Jul 27 17:55:20 xxxxxx.internal otelcol[18746]:
/builds/o11y-gdi/splunk-otel-collector-releaser/.go/pkg/mod/
go.opentelemetry.io/collector@v0.56.0/receiver/scraperhelper/scrapercontroller.go:172

What version did you use?
Latest

What config did you use?

  hostmetrics:
    collection_interval: 10s
    scrapers:
      cpu:
      disk:
      filesystem:
      memory:
      network:
      # System load average metrics https://en.wikipedia.org/wiki/Load_(computing)
      load:
      # Paging/Swap space utilization and I/O metrics
      paging:
      # Aggregated system process count metrics
      processes:
      # System processes metrics, disabled by default
      # process:

Environment
Create EC2 with Amazon Linux 2 Kernel 5.10 AMI 2.0.20220606.1 x86_64 HVM gp2
AMI ID: ami-02d1e544b84bf7502

Additional context

The way hostmetrics is implemented, it only logs at error level for partial errors.
Ideally, these logs should be at debug level to mute them.
options:
1- make conntrack metrics scraper optional
2- check the error string and only add to partial error if it does not contain no such file or directory , problem is that we'll loose the error unless we pass the logger object
3- improve AddPartial to allow adding debug logs and not only error. ex: Add an extra argument to identify the log message as debug, error , etc...

@dloucasfx dloucasfx added the bug Something isn't working label Aug 1, 2022
@dmitryax
Copy link
Member

dmitryax commented Aug 1, 2022

The conntrack metrics are already disabled by default. We just need to read the metrics configuration and don’t touch the conntrack API if it’s not enabled by user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/hostmetrics
Projects
None yet
2 participants