Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel-operator does not create clusterrole/ and clusterrole binding for otel-collector #1679

Closed
krimeshshah opened this issue Apr 26, 2023 · 16 comments
Labels
enhancement New feature or request good first issue Good for newcomers question Further information is requested

Comments

@krimeshshah
Copy link

Hi Team,

After deploying otel-operator , when i deployed otel-collector as descibed on this page https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-operator, i noticed that serviceaccount otel-collector was created but corresponding clusterrole and clusterrolebinding is not created by operator. Shouldnt it be taken care by operator.? Since it is not able to create clusterrole and clusterrolebinding we face issue fetching metrics from pods across different namespace

#######################################################################

E0415 19:49:43.196149 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0415 19:50:30.471358 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
###########################################################################

Regards,
Krimesh

@TylerHelmuth
Copy link
Member

Original issue: open-telemetry/opentelemetry-helm-charts#762

@krimeshshah
Copy link
Author

@TylerHelmuth @Allex1 Any work around i can use. Because using otel-col as a separate chart will not solve my purpose of scraping service monitor metrics. It has to be deployed with otel operator. Also as pe this issue open-telemetry/opentelemetry-helm-charts#69 - one work around mentioned is creating CR inside template directory by @VineethReddy02 . Any idea how do i do it?

@TylerHelmuth
Copy link
Member

At the moment there is no community helm chart for managing the custom resources themselves, you have to maintain them yourself. I believe most users do that by managing the yaml file itself since it is only 1 file.

As for the clusterrole/clusterrolbindings, how these are handled will be dependent on how your cluster is managed. Your cluster managers could create and maintain the resources for you and the collector managed by the operator could use them. I still like the idea of the operator supporting the creation of those resources when it creates the collector.

@krimeshshah
Copy link
Author

@TylerHelmuth These are the steps that i did to have the otel-cllector managed by operator

  1. Installed otel-operator with helm chart that also installed opentelemetrycollectors.opentelemetry.io crd.
  2. Then as per the instruction on operator page -https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-operator . i created otel-collector
    otel-collector was deployed successfully but getting below error while accessing pods at clusterscop to get the metrics

########################################################################
ng:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0427 15:00:13.754890 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
E0427 15:00:13.754925 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0427 15:00:50.934078 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
E0427 15:00:50.934105 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0427 15:01:31.769279 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
E0427 15:01:31.769323 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0427 15:02:03.341587 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
E0427 15:02:03.341616 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0427 15:02:39.106772 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
E0427 15:02:39.106800 1 reflector.go:140] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
W0427 15:03:29.794105 1 reflector.go:424] k8s.io/client-go@v0.26.3/tools/cache/reflector.go:169: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:logging:otelcol-collector" cannot list resource "pods" in API group "" at the cluster scope
E0427 15:0
##############################################################################

To fix this issue i created cluster role and clusterrole biding for the otel-collector service account.

@krimeshshah
Copy link
Author

krimeshshah commented Apr 27, 2023

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: otel-collector
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules:

  • apiGroups: [""] # "" indicates the core API group
    resources: ["pod"]
    verbs: ["get", "watch", "list"]

#################################################
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otelcol
subjects:

  • kind: ServiceAccount
    name: otelcol-collector # name of your service account
    namespace: logging #v this is the namespace your service account is in
    roleRef: # referring to your ClusterRole
    kind: ClusterRole
    name: otel-collector
    apiGroup: rbac.authorization.k8s.io
    #####################################################

@jaronoff97 jaronoff97 added enhancement New feature or request good first issue Good for newcomers question Further information is requested area:controller labels Apr 27, 2023
@krimeshshah
Copy link
Author

Shouldnt this fix the issue? or is this the right way to add or apply changes to crd managed by operator? Considering creation of otel-collector, clusterrole and clusterrolebinding is the change on top of crd

@TylerHelmuth
Copy link
Member

That does fix the issue, and is an acceptable solution. We discussed this issue in the Operator SIG meeting today and agreed that it is not the responsibility of the opentelemetry-operator chart since the chart does not manage an OpentelemetryCollector custom resource nor is it the responsibility of the operator (like I thought it would be).

Long term we think this chart would be a chart that is responsible for creating a clusterrole/clusterrolebinding for the collector created by the OpentelemetryCollector custom resource. This would follow the pattern used by other operator/helm chart combinations like kube-prometheus-stack.

@paologallinaharbur
Copy link
Member

paologallinaharbur commented Sep 5, 2023

Hello, @TylerHelmuth I'd like just to confirm that the expected behaviour is that:

  • The helm chart of the otel collector creates Roles and bindings if needed by any of the presets. Es presets.kubeletMetrics.enabled=true adds the following template:
rules:
  - apiGroups: [""]
    resources: ["nodes/stats"]
    verbs: ["get", "watch", "list"]
  • On the other hand, the chart of the operator or the operator itself is not and it is not expected in the future to manage roles or bindings. Therefore, if any permission is needed by the service account, the user needs to add them "manually"

If this is confirmed I'd state it clearly in the readme.. I was a bit confused to realize the chart is "smarter" (covering more use-cases) than the operator.

If so I image that we can close this as not-planned/will-not-do

@garett-eiq
Copy link

I encountered this today and it's kind of confusing behavior. Am I correct in thinking the recommended approach is that in addition to instantiating an OpenTelemetryCollector custom resource via the OTEL operator, I also need to create/specify a service account for the collector I'm creating, along with the ClusterRole and ClusterRoleBinding?

If so:

  1. i don't see any details about that anywhere in the README. can that be placed there so it's not a gotcha?
  2. it sounds like this is considered out of scope for the operator, but I don't really understand why. the expected behavior (for me, and i assume many others) is that creating the OpenTelemetryCollector resource will create everything needed for it to run. it's very surprising that necessary permissions aren't created for that, or at least have some configurable options for how you'd like to set it up. this kind of seems like a blind spot in terms of usability and adoption. or maybe i'm missing something?

@pavolloffay
Copy link
Member

done in #2327

@thefirstofthe300
Copy link

@pavolloffay Can we reopen this issue? The PR you linked to fixed an orthogonal issue to the one described by this issue. The PR adds a missing permission to the OTEL operator's controller.

This issue is about having the operator controller manage RBAC for the deployed OTEL collector deployment/daemonset and the service account generated for that specific deployment/daemonset.

@pavolloffay pavolloffay reopened this Jan 11, 2024
@jaronoff97
Copy link
Contributor

@thefirstofthe300 is the resolved by #2396 ?

@thefirstofthe300
Copy link

I believe this issue does indeed get fixed by that PR. Nice to see just how fast that wait was. 😆

@heruscode
Copy link

Hi, does the above #2396 fix also affect receivers? Because I am trying to use the prometheus receiver and I still get some errors in the logs of the collector pod(created by OpenTelemetryCollector operator resource)

OpenTelemetryCollector config:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'emissary-ingress'
          scrape_interval: 30s
          kubernetes_sd_configs:
            - role: endpoints
          relabel_configs:
            - source_labels: [__meta_kubernetes_endpoints_label_service]
              action: keep
              regex: ambassador-admin
exporters:
  prometheus:
    endpoint: 0.0.0.0:9464
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheus]  

logs:

W0419 03:05:00.650568       1 reflector.go:539] k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:emissary:emissary-ingress-collector" cannot list resource "endpoints" in API group "" at the cluster scope
E0419 03:05:00.650604       1 reflector.go:147] k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:emissary:emissary-ingress-collector" cannot list resource "endpoints" in API group "" at the cluster scope
W0419 03:05:21.574835       1 reflector.go:539] k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:emissary:emissary-ingress-collector" cannot list resource "services" in API group "" at the cluster scope
E0419 03:05:21.574869       1 reflector.go:147] k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:emissary:emissary-ingress-collector" cannot list resource "services" in API group "" at the cluster scope

@jaronoff97
Copy link
Contributor

@iblancasa will this be fixed by #2787 ?

@iblancasa
Copy link
Contributor

@iblancasa will this be fixed by #2787 ?

It is not related, I think. This seems to be a different problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants