-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are the current set of collector metrics adequate? #2165
Comments
A couple of initial comments:
Some naming issues need to be sorted out - e.g. |
I would prefer to create a google spreadsheet listing all metrics available in Jaeger components, and show how they map to otel metrics. GitHub ticket is not the best format for that analysis. The general answer to the two questions above - yes, we want to keep the expressiveness of Jaeger metrics. All existing dimensions were added for a reason, especially being able to quantify different sources and format of inbound traffic is important for operating a prod cluster. |
Needs more work, but initial mapping is here. Many of the mappings are not clear at the moment, so will need to dig into the code a bit to see what they actually represent. |
First draft of metrics comparison is now complete with comments that need discussion: https://docs.google.com/spreadsheets/d/1W6mGt3w47BlCdxVelnMbc_fL_HE_GC1CzO3Zu6-i83I/edit?usp=sharing |
There are various issues with the metrics so want to tackle one specific set of metrics first - specifically the The closest equivalent metrics produced by OTC currently Assuming that is not a problem, the issues are:
cc @jaegertracing/jaeger-maintainers |
Doesn’t the protocol label in jaeger refer to the inbound span format? |
@yurishkuro No, the reporter protocol was extracted from the metric name, to be a label, in 1.9.0. |
yes, I was thinking of the receiver transport, that should be a different metric anyway. |
@yurishkuro If those metrics seem ok for the agent reporter, I'll create some issues on the OTC repo to deal with the problems outlined? |
@objectiser so there are a bunch of red cells in your spreadsheet. Some of them are specific to jaeger client/agent integrations, what are your thoughts on those? I assume we can keep them out of scope, since OTel SDKs may not even have the same mechanisms. For clear misses, yes let's file tickets in OTel. |
@yurishkuro The collector metrics I was going to deal with in a separate comment (probably next week) - wanted to start with the agent reporter ones. May also raise issue in OTC repo about an equivalent metric for Regarding the |
Reported agent related metrics here: open-telemetry/opentelemetry-collector#662 |
Adding example metrics recorded by Jaeger with hotrod:
And OTEL metrics receiving data via Jaeger thrift receiver and sending to Jaeger collector (agent mode): https://pastebin.com/X4n9uSJ8 OTEL metrics |
Here is the set of new OTEL metrics: Receiver metrics: accepter/refused
Exporter metrics: failed/sent
Processor metrics: accepted spans/batches, dropped spans/batches, refused spans, queue length and latency, send fail, send latency, retry send
|
I have added a second tab to @objectiser doc - https://docs.google.com/spreadsheets/d/1W6mGt3w47BlCdxVelnMbc_fL_HE_GC1CzO3Zu6-i83I/edit?usp=sharing. It contains a similar comparison probably with more details. Here are my findings: We should address these things:
|
I was looking at 4. I could not find a way to distinguish between binary and compact in the agent's |
But OTel collector accepts ever more formats than Jaeger, why is format not needed? |
The receiver metrics are split by receiver type and transport. The idea here is that transport supports only a single format.
|
|
I think I might have a way how to split it into two values. What about |
that would be good & sufficient. |
Here is the PR open-telemetry/opentelemetry-collector#859 |
Zipkin receiver has the same problem, the dimension is only PR to fix the Zipkin metrics open-telemetry/opentelemetry-collector#867 |
+1 |
The remaining items here are:
|
Using the following OpenTelemetry collector config (with image built from master):
and using the business-application.yaml to create some test requests, it resulted in the following metrics:
The text was updated successfully, but these errors were encountered: