Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to optimize the performance of Kafka exporters? #36853

Open
xiaoyao2246 opened this issue Dec 16, 2024 · 4 comments
Open

How to optimize the performance of Kafka exporters? #36853

xiaoyao2246 opened this issue Dec 16, 2024 · 4 comments

Comments

@xiaoyao2246
Copy link

xiaoyao2246 commented Dec 16, 2024

Component(s)

exporter/kafka

Describe the issue you're reporting

I deployed a simple Collector using the OpenTelemetry Operator, and its configuration is as follows:

kind: OpenTelemetryCollector
metadata:
  name: collector-otlp
spec:
  image: xxxxx/opentelemetry-collector-contrib:0.40.0
  replicas: 1
  mode: deployment
  resources:
    limits:
      cpu: 1000m
      memory: 2048Mi
  config: |
    receivers:
      jaeger: 
        protocols: 
          thrift_http: 
            endpoint: 0.0.0.0:4316
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors: 
      batch: 
        send_batch_size: 500
        send_batch_max_size: 500
      resource: 
        attributes: 
        - key: from-collector
          value: temp
          action: insert
    exporters: 
      logging: 
        loglevel: info
      kafka: 
        brokers:
          - xx.xx.xx.xx:9092
          - xx.xx.xx.xx:9092
          - xx.xx.xx.xx:9092
        topic: otlp_trace_temp
        protocol_version: 2.0.0
    service:
      pipelines:
        traces:
          receivers: [otlp, jaeger]
          processors: [batch, resource]
          exporters: [logging, kafka]

Since the version of my Kubernetes is 1.20.11, I used the v0.40.0 version of the collector.

The configuration of my Collector is 1 core and 2GB of memory.

Part of the collector's logs are as follows:

2024-12-16T14:36:04.690Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:04.700Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:05.691Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:05.691Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:06.692Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:06.693Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:07.693Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:07.696Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:08.696Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:08.699Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:09.699Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:09.702Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:10.700Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:10.701Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:11.700Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:11.701Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:12.702Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:12.703Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:13.710Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:13.712Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:14.719Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:14.722Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:15.726Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:15.729Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:16.730Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:16.733Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:17.735Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:17.738Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:18.739Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:18.742Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:19.743Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:19.746Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:20.747Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}
2024-12-16T14:36:20.751Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 500}

In order to test the performance of the Collector, I send trace data to the Collector, The current performance of the Collector is as follows:
image

I found that under the current configuration, the CPU usage is relatively high, while the memory usage is very low.

My question is, is there any other way, or strategy, to improve Collector's performance? I'm just new to OpenTelemetry and hope to get some good advice!

Thank you all again for your help.

@xiaoyao2246 xiaoyao2246 added the needs triage New item requiring triage label Dec 16, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

v0.40.0 is very old.
Anyways, would it be possible to share CPU profile? You can use https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.40.0/extension/pprofextension/README.md

@VihasMakwana VihasMakwana removed the needs triage New item requiring triage label Dec 17, 2024
@xiaoyao2246
Copy link
Author

@VihasMakwana
Firstly, regarding the part of performance analysis, I'll try to configure it now. Thank you.

Secondly, I'd like to ask that for the K8S version 1.20.11, if I want to use a higher - version Collector, is there any way to do it? Or can I deploy the Collector independently without using the Opentelemetry Operator? I mainly want to use the function of partitioning by traceId in Kafka Export.

@xiaoyao2246
Copy link
Author

@VihasMakwana
I have successfully deployed a higher version of the Collector. I'm using version 0.90.0. Now I'm trying to optimize the performance of Kafka Export.

I saw it in the README.md of Kafka Export:

This exporter uses a synchronous producer that blocks and does not batch messages, therefore it should be used with batch and queued retry processors for higher throughput and resiliency.

Which component does the "queued retry processors" here refer to? I didn't find it in the repository.

Thank you again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants