Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document custom Kafka client metrics on otel.io #4063

Open
lmolkova opened this issue Dec 4, 2023 · 15 comments
Open

Document custom Kafka client metrics on otel.io #4063

lmolkova opened this issue Dec 4, 2023 · 15 comments
Assignees
Labels

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Dec 4, 2023

Context

  • New Kafka metrics design is outlined in KIP-714
  • It's not OTel-compatible and uses different names/attributes/instruments than otel
  • Metrics are intended to be collected by default, serialized to OTLP and sent to broker from which they can be collected by users - impl in java

Effectively, Kafka project does not plan to follow OTel semconv (@AndrewJSchofield to confirm)

OTel provides several kafka instrumentation components:

  1. Some of them are based on monkey-patching/byte-code rewriting and can emit otel-compatible metrics and traces
  2. Others (such as collector kafkareceiver or .NET Aspire Kafka integration library) scrape metrics that broker provide

The problem:

Group 1 (monkey-patched instrumentations) might still want to emit kafka-specific metrics/traces. We'll need to keep them in otel-semconv repo so they are consistent across languages/clients.

Group 2 (instrumentations that report what's available) have more difficult problems:
There are multiple ways to scrape different sets of metrics from Kafka:

  1. Java uses Kafka JMX metrics
  2. collector uses Kafka client library APIs to get stats
  3. .NET Aspire component uses metrics available through underlying librdkafka
  4. Once KIP-714 is implemented in different langauges, there will be yet another way

These metrics in most cases can't be converted to OTel ones (use different instruments, don't support histograms, don't report the same attributes, etc).

As a result, we're going to end up with each language SIG (plus external components) defining their own set of custom metrics for Kafka based on what they have.

What we can do on otel semconv side:

  • recommend a default way to scrape metrics from kafka
  • recommend naming approach (e.g. Java and collector start with kafka and not messaging.kafka as we do in OTel semconv)
  • provide a single place to list all existing non-otel-compatible kafka-related metrics (e.g. docs in opentelemetry.io) similar to JMX, but for all languages
@lmolkova
Copy link
Contributor Author

lmolkova commented Dec 4, 2023

@pyohannes
Copy link

  • recommend a default way to scrape metrics from kafka

For broker metrics, this could be the receiver implemented for the collector, it already maintains a list of supported metrics.

For client metrics, Kafka takes an approach similar to Kubernetes:

I don't think OTel should start to tackle the problems that arise from this, even more so as it's not fully implemented and working yet, and many details are still unclear.

As we now have generic messaging metrics defined (albeit experimental), we should rather build on those where possible. Which means, seeing whether we can map to those metrics, and define Kafka-specific extensions where needed.

@lmolkova
Copy link
Contributor Author

lmolkova commented Dec 4, 2023

@pyohannes the suggestion here is not to solve a big problem but reduce inconsistency for non-standard set of metrics so different instrumentations emit similar things.

For example just document them once like Java does for kafka library - https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/bbfe950ad0ace8123a5e6817fb3767e27a1a2cee/instrumentation/kafka/kafka-clients/kafka-clients-2.6/library/README.md.tTis documentation can be an informative one and live on opentelemetry.io

@jack-berg
Copy link
Member

Some of them are based on monkey-patching/byte-code rewriting and can emit otel-compatible metrics and traces

Just a small clarification. The java kafka client metrics solution works by bridging metrics from the kafka client library using API hooks provided by the library. No monkey patching / bytecode rewriting required. The code performs a minimal generic mapping between the instrument types, metric names, and attribute names. It does not cherry pick metrics from kafka and try to conform to any particular conventions - that approach was ruled out because it was too brittle and too time consuming given the sheer number of instruments exposed (> 200 IIRC).

@pyohannes
Copy link

For example just document them once like Java does for kafka library - [...]

That's fine for me, as long as what we document doesn't conflict with the generic messaging metrics that we have.

@lmolkova
Copy link
Contributor Author

lmolkova commented Dec 4, 2023

discussed at Semconv WG meeting on 12/4.

  1. Kafka owns 'official' metrics emitted by Kafka clients whatever they are
  2. OTel instrumentation libraries/components that emit other metrics should strive for consistency with each other whenever possible.

Next steps:

  • add an informative documentation (on otel.io)
    • suggest naming pattern
    • list non-standard metrics as a reference to try to conform to
    • Java JXM metrics (potentially modified based on the changes in the KIP-714) is the first candidate of common non-standard metrics.

@joaopgrassi
Copy link
Member

Given we removed them from conventions here open-telemetry/semantic-conventions#338, do we really need to do anything here?

@lmolkova lmolkova transferred this issue from open-telemetry/semantic-conventions Feb 28, 2024
@lmolkova lmolkova changed the title Where to keep custom Kafka client metrics Document custom Kafka client metrics on otel.io Feb 28, 2024
@lmolkova
Copy link
Contributor Author

Given we removed them from conventions here open-telemetry/semantic-conventions#338, do we really need to do anything here?

@joaopgrassi we still want to document them and the semconv WG decision was to have an informative section on opentelemetry.io, so I transferred issue

@svrnm
Copy link
Member

svrnm commented Mar 1, 2024

Thanks for transferring this issue @lmolkova . Since this is a first instance of something like that being documented, we need to figure out where and how to put this within the docs. To be honest right now I am not sure what the best place will be, any suggestions?

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 1, 2024

@svrnm I wonder if we can add a page under Semantic Conventions, something like "External conventions" where we would be able to provide documentation about non-otel-authored/compliant signals Otel collector/instrumentation libraries emit.

E.g.

Semantic Conventions
    External Conventions
        Kafka

I believe there are more candidates to be in that folder (looking into collector receivers, there are plenty of scrapers (Redis, RabbitMQ, ...) that don't always document metrics. Ideally, we want them to at least add a link to external documentation.

As an alternative, we could consider adding a section under "Collector" since most of this external conventions will come through it and then, in rare cases they are needed outside of the collector (like in java-instrumentation), we could just link the section in the Collector.

WDYT?

@austinlparker
Copy link
Member

Why aren't they following semconv?

@cartermp
Copy link
Contributor

cartermp commented Mar 2, 2024

I would prefer this:

add a page under Semantic Conventions, something like "External conventions"

Since it's consistent with where we keep naming for common components.

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 2, 2024

Why aren't they following semconv?

Kafka specific ones we want to find home for are legacy ones from pre-otel world (which Kafka owners AFAIK want to preserve for the time being).

@austinlparker
Copy link
Member

Why aren't they following semconv?

Kafka specific ones we want to find home for are legacy ones from pre-otel world (which Kafka owners AFAIK want to preserve for the time being).

Ah, ok.

@svrnm
Copy link
Member

svrnm commented Mar 4, 2024

Kafka specific ones we want to find home for are legacy ones from pre-otel world (which Kafka owners AFAIK want to preserve for the time being).

Is there a discussion that we can reference for that? Or, asked differently: have we (opentelemetry community) actively engaged in a conversation with them (kafka community) if this is they right way forward? Not that we can tell them what to do but we can at least help (if wanted) to make an inform decision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants