[target-allocator] Target allocator assigns target to multiple collectors #1038

kelseyma · 2022-08-16T22:10:54Z

I ran the target allocator with discovery via the config yaml (not using CRDs), and saw the TA assigning the same target to multiple collectors. This resulted in a target being scraped multiple times in a minute even though the scrape_interval is set to 1m. Also, I have noticed that sometimes it takes the target allocator 15 minutes to refresh and discover the desired endpoints, while other times the TA is able to discover the endpoints right away.

What I expected:

Target allocator is able to discover all the endpoints when started. TA assigns each collector a unique set of targets.

What actually happened:

TA discovered some endpoints initially when it was started at 19:02, but it took until the watch routine restart at 19:17 to discover the desired endpoints. Then the first period after discovery looks ok, but following the restart of the watch routine, each subsequent period had at least one target that was duplicated.

collector-0

// 19:18 - 19:32
2022-08-16T19:32:04.243Z -> service.instance.id: STRING(172.17.0.4:9121)

// After watch routine restart at 19:47:52
2022-08-16T19:52:04.240Z -> service.instance.id: STRING(172.17.0.4:9121)
2022-08-16T19:52:17.227Z -> service.instance.id: STRING(172.17.0.3:9121)

collector-1

// 19:18 - 19:32
2022-08-16T19:32:35.003Z -> service.instance.id: STRING(172.17.0.3:9121)
2022-08-16T19:32:48.234Z -> service.instance.id: STRING(172.17.0.5:9121)

// After watch routine restart at 19:47:52
2022-08-16T19:52:35.020Z -> service.instance.id: STRING(172.17.0.3:9121)
2022-08-16T19:52:46.528Z -> service.instance.id: STRING(172.17.0.4:9121)
2022-08-16T19:52:48.236Z -> service.instance.id: STRING(172.17.0.5:9121)

TA logs

{"level":"info","ts":1660676571.953488,"msg":"Starting the Target Allocator"}
{"level":"info","ts":1660676572.0064685,"logger":"setup","msg":"Starting server..."}
{"level":"info","ts":1660676572.0111964,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660677472.3938131,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660677472.400775,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660678372.403873,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660678372.4173315,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660679272.4196055,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660679272.4243846,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660680172.4269636,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660680172.4400752,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}

The text was updated successfully, but these errors were encountered:

Aneurysm9 · 2022-08-17T04:24:03Z

@jaronoff97 any chance this is related to the issue we were discussing earlier?

jaronoff97 · 2022-08-17T15:30:32Z

I think this is related to both #1037 and #1034. The first part of the expected behavior

Target allocator is able to discover all the endpoints when started.
Should be resolved by #1039

I think the second issue here (TA assigns each collector a unique set of targets.) is that the TA is returning incorrect data while targets are being assigned. @kelseyma would you be able to provide the configuration for the collector (including the scrape config) you were running?

kelseyma · 2022-08-17T18:55:35Z

@jaronoff97 Here is the collector config I'm running:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: poc-otel
spec:
  mode: statefulset
  replicas: 2
  serviceAccount: otel-collector
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.57.2
  targetAllocator:
    enabled: true
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:latest
    serviceAccount: otel-collector
  config: |
    exporters:
      logging:
        loglevel: debug
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'prometheus-exporter-endpoints'

              kubernetes_sd_configs:
                - role: endpoints
             
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_annotation_<added_custom_annotation>]
                  action: keep
                  regex: (value1|value2)
                  replacement: $$1
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                  replacement: $$1
                - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                  action: replace
                  target_label: __address__
                  regex: ([^:]+)(?::\d+)?;(\d+)
                  replacement: $$1:$$2
            
    service:
      pipelines:
        metrics:
          receivers: [ prometheus ]
          exporters: [ logging ]

jaronoff97 · 2022-08-18T17:25:31Z

with the merging of #1039 that will help with potentially missing targets. I also have a fix in the works to improve the target allocator's stability in allocation to help prevent these scenarios. I also believe the work done by @kristinapathak in #1040 will help resolve this as well.

kelseyma · 2022-08-26T02:29:23Z

@jaronoff97 Thank you for working on the fixes!

I ran TA with the latest changes and while the targets assigned to each collector are more consistent now, I believe there is still an issue with duplicate targets due to the prometheus exporter I'm monitoring being run as a sidecar. There are 2 targets being discovered via endpoints: one for the actual exporter container and one for the other container in the pod. This caused both collectors to scrape all of the endpoints, as one collector would have the target for the exporter container and the second collector would have the target for the other container. But both of these targets resolve to the same address because of this relabel config:

- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  action: replace
  target_label: __address__
  regex: ([^:]+)(?::\d+)?;(\d+)
  replacement: $$1:$$2

Is it possible to have the TA apply the relabel configs prior to allocating targets so we can drop any unnecessary targets? I checked the /metrics endpoint added and saw the TA allocating 36 targets (18 targets for each collector), when only 3 were actually valid.

jpkrohling · 2022-08-26T19:53:25Z

Reopening as requested by @kelseyma

kelseyma · 2022-08-29T20:03:10Z

Moving discussion for relabel configs to new issue: #1064. Closing this issue

pavolloffay added the area:target-allocator Issues for target-allocator label Aug 17, 2022

jaronoff97 mentioned this issue Aug 18, 2022

Fix unnecessary and incorrect reallocation #1041

Merged

jpkrohling closed this as completed in #1041 Aug 22, 2022

jpkrohling reopened this Aug 26, 2022

kelseyma closed this as completed Aug 29, 2022

kelseyma mentioned this issue Sep 8, 2022

[target-allocator] GetAllTargetsByCollectorAndJob overwrites labels for targets that share the same address #1083

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[target-allocator] Target allocator assigns target to multiple collectors #1038

[target-allocator] Target allocator assigns target to multiple collectors #1038

kelseyma commented Aug 16, 2022

Aneurysm9 commented Aug 17, 2022

jaronoff97 commented Aug 17, 2022

kelseyma commented Aug 17, 2022

jaronoff97 commented Aug 18, 2022

kelseyma commented Aug 26, 2022 •

edited

Loading

jpkrohling commented Aug 26, 2022

kelseyma commented Aug 29, 2022

[target-allocator] Target allocator assigns target to multiple collectors #1038

[target-allocator] Target allocator assigns target to multiple collectors #1038

Comments

kelseyma commented Aug 16, 2022

What I expected:

What actually happened:

Aneurysm9 commented Aug 17, 2022

jaronoff97 commented Aug 17, 2022

kelseyma commented Aug 17, 2022

jaronoff97 commented Aug 18, 2022

kelseyma commented Aug 26, 2022 • edited Loading

jpkrohling commented Aug 26, 2022

kelseyma commented Aug 29, 2022

kelseyma commented Aug 26, 2022 •

edited

Loading