-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry bridge #1631
OpenTelemetry bridge #1631
Conversation
This is feature complete now. It still needs some polish. The checkboxes in the issue description reflect the remaining todos. One thing to note is that I've changed how experimental plugins are handled. |
Also makes sure that only versions starting with 1.14.0 are instrumented
FYI I have updated version to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic! 👏
Mostly minor comments.
I didn't verify everything is covered properly through tests.
|
||
public OTelSpan(AbstractSpan<?> span) { | ||
this.span = span; | ||
span.incrementReferences(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that balanced through span end/report or is that an intentional span leak so to avoid recycling (like we do with public-API-touched spans)?
If it is the latter, we should consider somehow making a distinction between library-inherent API usage and custom instrumentations, not entirely sure how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an "intentional leak" to have a similar behavior like our Agent Public API where spans/transactions are not recycled.
Given the OTel API provides ways to wrap an active span into Callable
or Runnable
there are less opportunities for end-users to mis-use it, so maybe in the future we could optionally enable recycling.
We don't have any way to distinguish OTel API usages, so "custom instrumentation" and "library that explicitly uses it", at best we could make some heuristics by the provided fields and values but that sounds quite brittle. What would be the benefit to see a difference between the two ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the benefit to see a difference between the two ?
That inherent library instrumentations are less likely to misuse the API.
I'd like to provide the absolutely minimal overhead for such, mainly thinking on possible ES instrumentation, but not only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only about misusing the API. It's valid for a span to be used as a parent after it has ended. This would set the reference count to 0, yet it can't be recycled as it may be used as a parent of another span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then maybe we can do something even safer that will move the lifecycle management responsibility from us to the using library through the API, so that "advanced" libraries can optionally tell us when a span can be recycled.
I am not sure about the feasibility of convincing OTel to add an optional recycle()
/dispose()
functionality to the API, but we can use the current API, e.g. with a special "recycle" event through addEvent
, or a second invocation of end
.
It may be a significant advantage if we can pull off recycling for "specialist" libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We thought of a solution that may be as efficient and simple to get done: creating a tiny "OTel API extension" that contains not much more than Tracer.recycle(Object)
with a noop implementation that can be overridden dynamically with a concrete implementation. Then it's both efficient, robust and doesn't interfere with agent vendor.
But first thing is to measure what the actual overhead of not recycling is.
...telemetry-plugin/src/main/java/co/elastic/apm/agent/opentelemetry/sdk/OTelBridgeContext.java
Outdated
Show resolved
Hide resolved
@Nullable | ||
private OTelSpanKind otelKind = null; | ||
|
||
private final Map<String, Object> otelAttributes = new HashMap<>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is here and not in OTelSpan
? I assume it's because of serialization and trying to follow are the guidelines:
- we don't want a separate OTel branch under
AbstractSpan
, so we cannot extend or implement the internal types in the OTel plugin OTelSpan
wraps anAbastractSpan
, so it knows about it but not vice versa
I would be happy if we can lose any OTel stuff in our internal code as much as possible.
With the attributes it's easy since it's only a Map<String, Object>
so just renaming to attributes
would do. For the otelKind
it's also easy - since we only need its toString
implementation, the field can be of type Object
and renamed kind
and OTelSpanKind
can go into the plugin.
However, this will not accommodate future additions (like OTel events?) and it doesn't lose the OTel-specific serialization from our general-purpose serializer.
Proposal
Maybe a more generic solution is to introduce another interface to the co.elastic.apm.agent.impl.context
package:
interface CustomContext extends Recyclable
{
serialize(JsonWriter);
}
The plugin will implement it with attributes and kind and the serialize
implementation would be the current DslJsonSerializer#serializeOTel
implementation. It may eliminate the need to copy the attributes to an additional map.
It will be another AbstractSpan
field that will be serialized in the appropriate order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to look at an alternative for this, maybe close to what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical that pulling out the attributes out of AbstractSpan
is possible. Think about the case where someone uses the OTel API to add a custom attribute to a span or transaction that has been created by the auto-instrumentation of the agent.
The OTelSpan
is just an ephemeral bridge object where we should think twice before adding state to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, we shouldn't add a state to OTelSpan
, the AbstractSpan
would still own the bridge context. It's not where the data resides, only where the implementation resides. I prefer the bridge to maintain this part of the context, which is tightly coupled with the OTel API.
This is what I propose (calling it FreeFormContext
because the internal context already has room for custom context):
We add another context type that has a very loose API:
interface FreeFormContext extends Recyclable
{
serialize(JsonWriter);
}
Anything that implements it is also responsible for proper serialization of it. Not sure about recycling, but I think that as long as we don't allow multiple types of that within the same runtime, it can be set once and then only recycled. That is only relevant if we really find a proper way to recycle these spans of course.
Then we add another context property to spans:
public abstract class AbstractSpan<T extends AbstractSpan<T>> implements Recyclable {
....
public abstract AbstractContext getContext();
....
public abstract void setFreeFormContext(FreeFormContext);
public abstract FreeFormContext getFreeFormContext();
}
The bridge will have a OTelSpanContext implements FreeFormContext
and instead of doing:
span.getOtelAttributes().put("key", "value");
OTelSpan
would do something like:
((OTelSpanContext)span.getFreeFormContext()).setAttribute("key", "value");
I hope this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean here, but I'm not sure that we can generically handle the JSON serialization part in an elegant way.
We could even allow the plugin to create dedicated sub-classes of Transaction
and Span
for that purpose.
For example, we have to assume that the serialized form will always be in the same place in JSON.
In this case, we have an extra otel
attribute: span.otel
object with span.otel.kind
and span.otel.attributes
, but any other implementation could use a different structure.
While I like the idea to keep the core of the agent as OTel-agnostic as possible, I don't think that we should handle this right now:
- we only have a single implementation, thus it's probably too early to create an abstraction that will fit a future abstraction.
- I don't see any other possible implementation candidate for this.
- it's only a couple of fields, thus it seems an acceptable compromise, and we can always revisit this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I'm not sure that we can generically handle the JSON serialization part in an elegant way.
Wouldn't a copy+paste of the current serialization work without any change?
For example, we have to assume that the serialized form will always be in the same place in JSON.
JSON relies on nesting. You can assume that you are in the right place within the "parent" node serialization because it is the one invoking you and once you are invoked, you serialize the nested tree as you know it should be.
While I like the idea to keep the core of the agent as OTel-agnostic as possible, I don't think that we should handle this right now: ...
I see it exactly the opposite 😄
Even though it seems it is going this direction, we don't yet know whether using us with OTel is going to be the absolute norm. It seemed to go there with OpenTracing before. Currently we add fields to our core to support niche cases.
I don't think about other implementations, it's not about abstraction for the purpose of polymorphism, it's abstraction for the purpose of proper responsibility separation - the bridge should know about the core, the core should not know about the bridge.
That said, I am only explaining to emphasize my rationale, it's not something I will insist on so feel free to make the call and leave as is.
...entelemetry-plugin/src/main/java/co/elastic/apm/agent/opentelemetry/sdk/OTelSpanBuilder.java
Show resolved
Hide resolved
...pm-opentelemetry-test/src/test/java/co/elastic/apm/opentelemetry/OpenTelemetryVersionIT.java
Show resolved
Hide resolved
@@ -0,0 +1,244 @@ | |||
@opentelemetry-bridge | |||
Feature: OpenTelemetry bridge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
otelTracer = openTelemetry.getTracer(null); | ||
|
||
// otel spans are not recycled for now | ||
disableRecyclingValidation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so this answers my question above - we leak those on purpose.
Makes sense in general.
It looks like it will be quite difficult to maintain refs properly. Paying this extra overhead for arbitrary manual usage of the API worth it, but ideally we could also have a more optimized mode for libraries that embed OTel API usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this answers my previous question on the distinction between manual instrumentation and calls to the OTel API that are already in an existing library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this answers my previous question on the distinction between manual instrumentation and calls to the OTel API that are already in an existing library.
Just like happened to you, I only got here now 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed the separation of responsibilities both here and offline and that's good enough for me. It's definitely not critical, only a design perspective, so I am perfectly fine with leaving it as is.
What does this PR do?
Checklist
I have updated supported-technologies.asciidocThat's ensured by matching on the version
Other TODOs