-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotation based telemetry #691
Annotation based telemetry #691
Conversation
Signed-off-by: Peter Nied <peternied@hotmail.com>
Signed-off-by: Peter Nied <peternied@hotmail.com>
Signed-off-by: Peter Nied <peternied@hotmail.com>
implementation platform("io.opentelemetry:opentelemetry-bom:1.34.1") | ||
|
||
implementation('io.opentelemetry:opentelemetry-api') | ||
implementation('io.opentelemetry:opentelemetry-sdk') | ||
implementation('io.opentelemetry:opentelemetry-sdk-metrics') | ||
|
||
implementation("io.opentelemetry:opentelemetry-exporter-logging") | ||
implementation('io.opentelemetry:opentelemetry-sdk-testing') | ||
implementation('io.opentelemetry:opentelemetry-sdk-extension-autoconfigure') | ||
implementation('io.opentelemetry.instrumentation:opentelemetry-instrumentation-annotations:2.4.0') | ||
|
||
testImplementation testFixtures(project(path: ':coreUtilities')) | ||
testImplementation testFixtures(project(path: ':testHelperFixtures')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: pare this down to only whats needed
private static final LongCounter documentsProcessedCounter; | ||
private static final LongCounter batchesProcessedCounter; | ||
static { | ||
final var meter = GlobalOpenTelemetry.getMeter("DocumentReindexer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires you to register a global static, which will have some big impacts on tests. Some tests might want to use no instrumentation at all and others might want to use in-memory structures. One test run might nearly blow out memory due to combinatoric explosion and another subsequent test might hold the bag for it. This could also have an impact on development speed for some too.
@@ -289,7 +291,8 @@ public Optional<ObjectNode> updateDocument(String indexName, String documentId, | |||
} | |||
} | |||
|
|||
public Mono<BulkResponse> sendBulkRequest(String indexName, String body) { | |||
@WithSpan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you nest spans with this annotation approach?
This isn't as valuable as it could be without understanding how it ties to the context above it. My guess is that it might rely upon static methods, which might work for now, but could fall apart as more of this codebase becomes asynchronous.
|
||
private InMemoryMetricReader metricReader; | ||
|
||
@BeforeEach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work when tests get run in parallel
@@ -5,6 +5,7 @@ plugins { | |||
id "com.avast.gradle.docker-compose" version "0.17.4" | |||
id 'com.bmuschko.docker-remote-api' | |||
id 'io.freefair.lombok' version '8.6' | |||
id 'com.ryandens.javaagent-application' version "0.5.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of deficiencies that concern me (and concerned me originally when I looked into alternate otel approaches).
- I don't think that there's an easy way to hook all of your spans into the ActiveContextTracker. As it is, 3 lines would do that for RFS. The ActiveContextTracker is a lifesaver into visibility when work items never terminate.
- I don't know how to chain/nest spans. Attribute propagation looks like overkill in the coreUtils packages, but is invaluable when a top-level span doesn't finish.
- How would span linking work when parents are no longer in the backtrace of a given method call?
- It seems like there are about 20 lines of code added to each file that emits a metric. The coreUtils solution takes that down to about 3 (import, context param, pass context param as is or by creating a new child). If a developer spends most of their day perfecting the class that they're working in, why require them to also pollute the file with lots of instrumentation details. 20 lines of code doesn't take into account projecting a metric with different sets of attributes so that we can do analysis over more scenarios (like we do with status codes for the replayer).
- The clarity of the overall instrumentation story seems much better for the coreUtils packages. With this code, I don't know how spans are related and it's hard to know where, if at all, a metric is emitted. That sounds like it should be obvious, but we could easily add the # of bytes across the wire in 3 places and the contributors for each one may have no idea that other codepaths did the same (with the same or different metric names).
- Within instrumented code, spans and metrics are disjoint, so the implementor needs to know to keep these things in sync/manually make changes.
So, what are the benefits of this approach over direct utilization of the tools provided by org.opensearch.migrations.tracing? As per the README for the coreUtils package... In which ways is this alternative approach better or what was missing from that list?
This library adapts the parts of OpenTelemetry to make it more natural and more foolproof throughout the rest of the TrafficCapture packages. This package introduces the concept of "Contexts" to build manage all tracing and metering instrumentation.
Just as the otel metering and tracing can be efficiently disabled by not configuring them, this library provides some future-proofing by defining interfaces to track attributes, activities, exceptions, etc - but through descriptive interfaces where callers describe which actions they're performing, preventing the code from becoming overly complex.
The goals of the instrumentation package are to
- make instrumentation classes easy to use.
- make it easy to create new safe and easy to use instrumentation classes.
- be efficient enough to use in most cases and flexible enough to tune in cases where the cost is too high.
The third point is still a work in progress as the exact performance penalty isn't understood yet. However, work for point #2 dovetails into #3. As context creations are chained together, a no-op uber-context can be created with zero memory footprint and minimal CPU penalty. The first couple points are accomplished by putting contextual information alongside other data as first class parameters and fields. For example, where a method might require an identifier, a context might be passed instead so that the function can retrieve identifying information via the context AND have the ability to instrument activity within the appropriate context.
@gregschohn Thanks for looking into this and providing some thoughts on what has already been done in the Traffic Capture space - I am going to close out this PR. It (mostly) served its purpose in getting me familiar with one of the options and helps clarify the advantages to the time investment in with the manual instrumentation and span definition. |
Description
This is an alternative approach to adding telemetry by using annotations. I've only validated the metrics side of the system, still need to
Todo to leave draft
Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.