Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[target-allocator] Target allocator assigns target to multiple collectors #1038

Closed
kelseyma opened this issue Aug 16, 2022 · 7 comments · Fixed by #1041
Closed

[target-allocator] Target allocator assigns target to multiple collectors #1038

kelseyma opened this issue Aug 16, 2022 · 7 comments · Fixed by #1041
Labels
area:target-allocator Issues for target-allocator

Comments

@kelseyma
Copy link
Contributor

I ran the target allocator with discovery via the config yaml (not using CRDs), and saw the TA assigning the same target to multiple collectors. This resulted in a target being scraped multiple times in a minute even though the scrape_interval is set to 1m. Also, I have noticed that sometimes it takes the target allocator 15 minutes to refresh and discover the desired endpoints, while other times the TA is able to discover the endpoints right away.

What I expected:

Target allocator is able to discover all the endpoints when started. TA assigns each collector a unique set of targets.

What actually happened:

TA discovered some endpoints initially when it was started at 19:02, but it took until the watch routine restart at 19:17 to discover the desired endpoints. Then the first period after discovery looks ok, but following the restart of the watch routine, each subsequent period had at least one target that was duplicated.

collector-0

// 19:18 - 19:32
2022-08-16T19:32:04.243Z -> service.instance.id: STRING(172.17.0.4:9121)

// After watch routine restart at 19:47:52
2022-08-16T19:52:04.240Z -> service.instance.id: STRING(172.17.0.4:9121)
2022-08-16T19:52:17.227Z -> service.instance.id: STRING(172.17.0.3:9121)

collector-1

// 19:18 - 19:32
2022-08-16T19:32:35.003Z -> service.instance.id: STRING(172.17.0.3:9121)
2022-08-16T19:32:48.234Z -> service.instance.id: STRING(172.17.0.5:9121)

// After watch routine restart at 19:47:52
2022-08-16T19:52:35.020Z -> service.instance.id: STRING(172.17.0.3:9121)
2022-08-16T19:52:46.528Z -> service.instance.id: STRING(172.17.0.4:9121)
2022-08-16T19:52:48.236Z -> service.instance.id: STRING(172.17.0.5:9121)

TA logs

{"level":"info","ts":1660676571.953488,"msg":"Starting the Target Allocator"}
{"level":"info","ts":1660676572.0064685,"logger":"setup","msg":"Starting server..."}
{"level":"info","ts":1660676572.0111964,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660677472.3938131,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660677472.400775,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660678372.403873,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660678372.4173315,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660679272.4196055,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660679272.4243846,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660680172.4269636,"logger":"allocator","msg":"Restarting watch routine","component":"opentelemetry-targetallocator"}
{"level":"info","ts":1660680172.4400752,"logger":"allocator","msg":"Successfully started a collector pod watcher","component":"opentelemetry-targetallocator"}
@Aneurysm9
Copy link
Member

@jaronoff97 any chance this is related to the issue we were discussing earlier?

@pavolloffay pavolloffay added the area:target-allocator Issues for target-allocator label Aug 17, 2022
@jaronoff97
Copy link
Contributor

I think this is related to both #1037 and #1034. The first part of the expected behavior

Target allocator is able to discover all the endpoints when started.
Should be resolved by #1039

I think the second issue here (TA assigns each collector a unique set of targets.) is that the TA is returning incorrect data while targets are being assigned. @kelseyma would you be able to provide the configuration for the collector (including the scrape config) you were running?

@kelseyma
Copy link
Contributor Author

@jaronoff97 Here is the collector config I'm running:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: poc-otel
spec:
  mode: statefulset
  replicas: 2
  serviceAccount: otel-collector
  image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.57.2
  targetAllocator:
    enabled: true
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:latest
    serviceAccount: otel-collector
  config: |
    exporters:
      logging:
        loglevel: debug
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'prometheus-exporter-endpoints'

              kubernetes_sd_configs:
                - role: endpoints
             
              relabel_configs:
                - source_labels: [__meta_kubernetes_service_annotation_<added_custom_annotation>]
                  action: keep
                  regex: (value1|value2)
                  replacement: $$1
                - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
                  action: replace
                  target_label: __metrics_path__
                  regex: (.+)
                  replacement: $$1
                - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
                  action: replace
                  target_label: __address__
                  regex: ([^:]+)(?::\d+)?;(\d+)
                  replacement: $$1:$$2
            
    service:
      pipelines:
        metrics:
          receivers: [ prometheus ]
          exporters: [ logging ]

@jaronoff97
Copy link
Contributor

with the merging of #1039 that will help with potentially missing targets. I also have a fix in the works to improve the target allocator's stability in allocation to help prevent these scenarios. I also believe the work done by @kristinapathak in #1040 will help resolve this as well.

@kelseyma
Copy link
Contributor Author

kelseyma commented Aug 26, 2022

@jaronoff97 Thank you for working on the fixes!

I ran TA with the latest changes and while the targets assigned to each collector are more consistent now, I believe there is still an issue with duplicate targets due to the prometheus exporter I'm monitoring being run as a sidecar. There are 2 targets being discovered via endpoints: one for the actual exporter container and one for the other container in the pod. This caused both collectors to scrape all of the endpoints, as one collector would have the target for the exporter container and the second collector would have the target for the other container. But both of these targets resolve to the same address because of this relabel config:

- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
  action: replace
  target_label: __address__
  regex: ([^:]+)(?::\d+)?;(\d+)
  replacement: $$1:$$2

Is it possible to have the TA apply the relabel configs prior to allocating targets so we can drop any unnecessary targets? I checked the /metrics endpoint added and saw the TA allocating 36 targets (18 targets for each collector), when only 3 were actually valid.

@jpkrohling
Copy link
Member

Reopening as requested by @kelseyma

@jpkrohling jpkrohling reopened this Aug 26, 2022
@kelseyma
Copy link
Contributor Author

Moving discussion for relabel configs to new issue: #1064. Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants