Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry Operator never reconciles replicas for collector gateway #2039

Closed
jawnsy opened this issue Aug 22, 2023 · 3 comments
Closed

OpenTelemetry Operator never reconciles replicas for collector gateway #2039

jawnsy opened this issue Aug 22, 2023 · 3 comments

Comments

@jawnsy
Copy link
Contributor

jawnsy commented Aug 22, 2023

Summary

We have an OpenTelemetryCollector CRD for managing an OpenTelemetry Collector Gateway as well as sidecars. In mode: deployment with replicas: 3, we're noticing that our scaling never changes, and we always have 1 replica.

Here's the current state of the CRD we're deploying:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  creationTimestamp: "2023-05-23T22:30:56Z"
  generation: 13
  labels:
    app: otelcol-gateway
    app.kubernetes.io/managed-by: opentelemetry-operator
    kustomize.toolkit.fluxcd.io/name: opentelemetry-collector
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: otelcol-gateway
  namespace: opentelemetry
  resourceVersion: "142484797"
  uid: c8c48f7a-cf9d-4228-9742-916e7a71c0c3
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
      zipkin:
        endpoint: 0.0.0.0:9411
      prometheus/internal:
        config:
          scrape_configs:
            - job_name: otelcol-gateway
              scrape_interval: 15s
              static_configs:
                - targets: [0.0.0.0:8888]
    exporters:
      logging/self:
        verbosity: normal
      datadog:
        api:
          site: datadoghq.com
          key: "$DATADOG_APM_KEY"
      prometheus:
        endpoint: 0.0.0.0:9090
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 1250
        spike_limit_mib: 200
      attributes/env:
        actions:
          - action: insert
            key: env
            value: dev
      batch:
      batch/dd:
        # Datadog APM Intake limit is 3.2M
        # https://docs.datadoghq.com/tracing/trace_collection/open_standards/otel_collector_datadog_exporter/#configuring-the-datadog-exporter
        send_batch_max_size: 1000
        send_batch_size: 100
        timeout: 10s
    service:
      pipelines:
        traces/logged:
          receivers: [otlp]
          processors: []
          exporters: [logging/self]
        traces/datadog:
          receivers: [otlp]
          processors: [batch/dd, attributes/env]
          exporters: [datadog]
        metrics/prometheus:
          receivers: [prometheus/internal]
          processors: []
          exporters: [prometheus]
      telemetry:
        logs:
          level: error
          output_paths: [stdout]
  env:
    - name: DATADOG_APM_KEY
      valueFrom:
        secretKeyRef:
          key: key
          name: datadog-apm-key
          optional: false
  ingress:
    route: {}
  mode: deployment
  nodeSelector:
    node_pool: application
  podAnnotations:
    sidecar.istio.io/proxyCPU: 850m
    sidecar.istio.io/proxyCPULimit: 1000m
    sidecar.istio.io/proxyMemory: 400Mi
    sidecar.istio.io/proxyMemoryLimit: 700Mi
  podSecurityContext:
    runAsNonRoot: true
    runAsUser: 10001
    seccompProfile:
      type: RuntimeDefault
  replicas: 3 # replicas set to 3 here
  resources:
    limits:
      cpu: 150m
      memory: 250Mi
    requests:
      cpu: 75m
      memory: 180Mi
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    readOnlyRootFilesystem: true
    runAsNonRoot: true
    runAsUser: 10001
    seccompProfile:
      type: RuntimeDefault
  targetAllocator:
    prometheusCR: {}
  upgradeStrategy: automatic
status:
  scale:
    replicas: 1 # status always shows 1 replica
    selector: app=otelcol-gateway,app.kubernetes.io/component=opentelemetry-collector,app.kubernetes.io/instance=opentelemetry.otelcol-gateway,app.kubernetes.io/managed-by=opentelemetry-operator,app.kubernetes.io/name=otelcol-gateway-collector,app.kubernetes.io/part-of=opentelemetry,app.kubernetes.io/version=latest,kustomize.toolkit.fluxcd.io/name=opentelemetry-collector,kustomize.toolkit.fluxcd.io/namespace=flux-system
  version: 0.82.0

We're using v0.82.0 and the operator Helm chart v0.35.2. We're not using autoscaling.

@jaronoff97
Copy link
Contributor

Hello, we've observed issues with the prometheus exporter in version 0.82 (#2016) of the operator, please update your operator to the latest version (0.83) and let me know if that fixes your problem

@yuriolisa
Copy link
Contributor

@jawnsy, did you have the opportunity to upgrade your operator?

@jawnsy
Copy link
Contributor Author

jawnsy commented Sep 19, 2023

@yuriolisa We're currently running 0.37.1. I haven't checked, but this issue might have been resolved by one of my teammates, @parkedwards. He mentioned that it could be a symptom of this issue mentioned in the docs:

With Helm v3.0, CRDs created by this chart are not updated by default and should be manually updated. Consult also the Helm Documentation on CRDs.

I think we can close this for now. Thanks for taking a look!

@jawnsy jawnsy closed this as completed Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants