Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in agent logs #697

Closed
KalebHawkins opened this issue Mar 16, 2023 · 8 comments
Closed

Errors in agent logs #697

KalebHawkins opened this issue Mar 16, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@KalebHawkins
Copy link

Hello, I am trying to use the collector on an Openshift 4.11 cluster to forward logs to Splunk. I am using chart version 0.70. Using a pretty basic configuration file.

splunk:
  clusterName: "<REDACTED>"
  splunkPlatform:
    endpoint: "<REDACTED>"
    token: "<REDACTED>"
    index: "<REDACTED>"
    metricsIndex: "<REDACTED>"
    tracesIndex: "<REDACTED>"
    logsEnabled: true
    metricsEnabled: true
    tracesEnabled: false
  logsEngine: otel
  distribution: "openshift"
  environment: <REDACTED>

  agent:
    enabled: true
    resources:
      limits:
        #cpu: 1000m
        memory: 4Gi
        
  clusterReceiver:
    enabled: true
    resources:
      limits:
        #cpu: 1000m
        memory: 8Gi

  logsCollection:
    containers:
      enabled: true
      containerRuntime: "cri-o"

The agents get the following error messages in their logging. I am wondering if this is an Openshift thing or if I have something misconfigured.

2023-03-16T15:53:22.447Z error prometheusexporter/prometheus.go:139 Could not get prometheus metrics {"kind": "receiver", "name": "receiver_creator", "pipeline": "metrics", "name": "smartagent/kubernetes-scheduler/receiver_creator{endpoint=\"xxx.xxx.xxx.xxx\"}/k8s_observer/ebe4c92e-8b5d-4be3-9945-cd439c5825c8", "monitorID": "smartagentkubernetesschedulerreceiver_creatorendpoint2071306682k8s_observerebe4c92e8b5d4be39945cd439c5825c8", "error": "Get \"http://xxx.xxx.xxx.xxx:10251/metrics\": dial tcp xxx.xxx.xxx.xxx:10251: connect: connection refused", "monitorType": "kubernetes-scheduler"}
2023-03-16T15:53:22.550Z error prometheusexporter/prometheus.go:139 Could not get prometheus metrics {"kind": "receiver", "name": "receiver_creator", "pipeline": "metrics", "name": "smartagent/kubernetes-proxy/receiver_creator{endpoint=\"xxx.xxx.xxx.xxx\"}/k8s_observer/d742b91e-685e-4c35-8197-0b56ebc88e39", "monitorID": "smartagentkubernetesproxyreceiver_creatorendpoint2071306682k8s_observerd742b91e685e4c3581970b56ebc88e39", "monitorType": "kubernetes-proxy", "error": "Get \"http://xxx.xxx.xxx.xxx:29101/metrics\": dial tcp xxx.xxx.xxx.xxx:29101: connect: connection refused"}
@jvoravong
Copy link
Contributor

We need to update receiver configurations to match any updates the Openshift kube-scheduler and kube-poxy may have recently received.

You can disable the affected receivers as a temporary solution with these values.

agent.controlPlaneMetrics.proxy.enabled: false
agent.controlPlaneMetrics.scheduler.enabled: false

@jvoravong jvoravong added the bug Something isn't working label Mar 16, 2023
@KalebHawkins
Copy link
Author

I just tested that configuration. It did get rid of the errors. Thanks.

@aligthart
Copy link

aligthart commented Mar 20, 2023

We noticed the same (chart version 0.72, k8s 1.23 installed with kops).

The helm chart hardcodes port 10251. However this port has been deprecated.

From: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#cluster-lifecycle-1

Kubeadm: enable the usage of the secure kube-scheduler and kube-controller-manager ports for health checks. 
For kube-scheduler was 10251, becomes 10259. 
For kube-controller-manager was 10252, becomes 10257. 
(https://github.com/kubernetes/kubernetes/pull/85043, [@neolit123](https://github.com/neolit123))

@jvoravong jvoravong self-assigned this Mar 20, 2023
@kishah-lilly
Copy link

@jvoravong do you have a timeline when this will be fixed?

@atoulme
Copy link
Contributor

atoulme commented Mar 23, 2023

We typically do not communicate timelines or commit to resolution times on Github if we can. Please make sure to open a support case if you are encountering this issue, so we can best help you.

@atoulme
Copy link
Contributor

atoulme commented Apr 5, 2023

Closing this as fixed.

@atoulme atoulme closed this as completed Apr 5, 2023
@kishah-lilly
Copy link

@jvoravong @atoulme
The change implemented here https://github.com/signalfx/splunk-otel-collector-chart/pull/711/files fixes the kubernetes-scheduler issue however, it does not fix the kubernetes-proxy issue.

From original post:

2023-03-16T15:53:22.550Z error prometheusexporter/prometheus.go:139 Could not get prometheus metrics {"kind": "receiver", "name": "receiver_creator", "pipeline": "metrics", "name": "smartagent/kubernetes-proxy/receiver_creator{endpoint="xxx.xxx.xxx.xxx"}/k8s_observer/d742b91e-685e-4c35-8197-0b56ebc88e39", "monitorID": "smartagentkubernetesproxyreceiver_creatorendpoint2071306682k8s_observerd742b91e685e4c3581970b56ebc88e39", "monitorType": "kubernetes-proxy", "error": "Get "http://xxx.xxx.xxx.xxx:29101/metrics\": dial tcp xxx.xxx.xxx.xxx:29101: connect: connection refused"}

@atoulme
Copy link
Contributor

atoulme commented Apr 27, 2023

Moved to a separate issue, #758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants