OpenTelemetry Operator never reconciles replicas for collector gateway #2039

jawnsy · 2023-08-22T23:48:57Z

Summary

We have an OpenTelemetryCollector CRD for managing an OpenTelemetry Collector Gateway as well as sidecars. In mode: deployment with replicas: 3, we're noticing that our scaling never changes, and we always have 1 replica.

Here's the current state of the CRD we're deploying:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  creationTimestamp: "2023-05-23T22:30:56Z"
  generation: 13
  labels:
    app: otelcol-gateway
    app.kubernetes.io/managed-by: opentelemetry-operator
    kustomize.toolkit.fluxcd.io/name: opentelemetry-collector
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: otelcol-gateway
  namespace: opentelemetry
  resourceVersion: "142484797"
  uid: c8c48f7a-cf9d-4228-9742-916e7a71c0c3
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
      zipkin:
        endpoint: 0.0.0.0:9411
      prometheus/internal:
        config:
          scrape_configs:
            - job_name: otelcol-gateway
              scrape_interval: 15s
              static_configs:
                - targets: [0.0.0.0:8888]
    exporters:
      logging/self:
        verbosity: normal
      datadog:
        api:
          site: datadoghq.com
          key: "$DATADOG_APM_KEY"
      prometheus:
        endpoint: 0.0.0.0:9090
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 1250
        spike_limit_mib: 200
      attributes/env:
        actions:
          - action: insert
            key: env
            value: dev
      batch:
      batch/dd:
        # Datadog APM Intake limit is 3.2M
        # https://docs.datadoghq.com/tracing/trace_collection/open_standards/otel_collector_datadog_exporter/#configuring-the-datadog-exporter
        send_batch_max_size: 1000
        send_batch_size: 100
        timeout: 10s
    service:
      pipelines:
        traces/logged:
          receivers: [otlp]
          processors: []
          exporters: [logging/self]
        traces/datadog:
          receivers: [otlp]
          processors: [batch/dd, attributes/env]
          exporters: [datadog]
        metrics/prometheus:
          receivers: [prometheus/internal]
          processors: []
          exporters: [prometheus]
      telemetry:
        logs:
          level: error
          output_paths: [stdout]
  env:
    - name: DATADOG_APM_KEY
      valueFrom:
        secretKeyRef:
          key: key
          name: datadog-apm-key
          optional: false
  ingress:
    route: {}
  mode: deployment
  nodeSelector:
    node_pool: application
  podAnnotations:
    sidecar.istio.io/proxyCPU: 850m
    sidecar.istio.io/proxyCPULimit: 1000m
    sidecar.istio.io/proxyMemory: 400Mi
    sidecar.istio.io/proxyMemoryLimit: 700Mi
  podSecurityContext:
    runAsNonRoot: true
    runAsUser: 10001
    seccompProfile:
      type: RuntimeDefault
  replicas: 3 # replicas set to 3 here
  resources:
    limits:
      cpu: 150m
      memory: 250Mi
    requests:
      cpu: 75m
      memory: 180Mi
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    readOnlyRootFilesystem: true
    runAsNonRoot: true
    runAsUser: 10001
    seccompProfile:
      type: RuntimeDefault
  targetAllocator:
    prometheusCR: {}
  upgradeStrategy: automatic
status:
  scale:
    replicas: 1 # status always shows 1 replica
    selector: app=otelcol-gateway,app.kubernetes.io/component=opentelemetry-collector,app.kubernetes.io/instance=opentelemetry.otelcol-gateway,app.kubernetes.io/managed-by=opentelemetry-operator,app.kubernetes.io/name=otelcol-gateway-collector,app.kubernetes.io/part-of=opentelemetry,app.kubernetes.io/version=latest,kustomize.toolkit.fluxcd.io/name=opentelemetry-collector,kustomize.toolkit.fluxcd.io/namespace=flux-system
  version: 0.82.0

We're using v0.82.0 and the operator Helm chart v0.35.2. We're not using autoscaling.

The text was updated successfully, but these errors were encountered:

jaronoff97 · 2023-08-23T14:30:31Z

Hello, we've observed issues with the prometheus exporter in version 0.82 (#2016) of the operator, please update your operator to the latest version (0.83) and let me know if that fixes your problem

yuriolisa · 2023-09-19T14:08:14Z

@jawnsy, did you have the opportunity to upgrade your operator?

jawnsy · 2023-09-19T14:13:59Z

@yuriolisa We're currently running 0.37.1. I haven't checked, but this issue might have been resolved by one of my teammates, @parkedwards. He mentioned that it could be a symptom of this issue mentioned in the docs:

With Helm v3.0, CRDs created by this chart are not updated by default and should be manually updated. Consult also the Helm Documentation on CRDs.

I think we can close this for now. Thanks for taking a look!

jawnsy closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry Operator never reconciles replicas for collector gateway #2039

OpenTelemetry Operator never reconciles replicas for collector gateway #2039

jawnsy commented Aug 22, 2023

jaronoff97 commented Aug 23, 2023

yuriolisa commented Sep 19, 2023

jawnsy commented Sep 19, 2023

OpenTelemetry Operator never reconciles replicas for collector gateway #2039

OpenTelemetry Operator never reconciles replicas for collector gateway #2039

Comments

jawnsy commented Aug 22, 2023

jaronoff97 commented Aug 23, 2023

yuriolisa commented Sep 19, 2023

jawnsy commented Sep 19, 2023