OpenTelemetry bridge #1631

felixbarny · 2021-01-25T07:43:36Z

What does this PR do?

Checklist

Other TODOs

Map Elastic APM specific attributes, similar to the OpenTelemetry bridge

apmmachine · 2021-01-25T08:23:29Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-03-21T09:28:48.078+0000
Duration: 44 min 17 sec

Test stats 🧪

Test	Results
Failed	0
Passed	2840
Skipped	20
Total	2860

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
run benchmark tests : Run the benchmark tests.
run jdk compatibility tests : Run the JDK Compatibility tests.
run integration tests : Run the Agent Integration tests.
run end-to-end tests : Run the APM-ITs.
run windows tests : Build & tests on windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

This reverts commit 56594eb.

felixbarny · 2021-01-28T09:01:19Z

This is feature complete now. It still needs some polish. The checkboxes in the issue description reflect the remaining todos.

One thing to note is that I've changed how experimental plugins are handled.
The disable_instrumentations option now doesn't control whether or not experimental plugins are enabled. There's a separate option for it now: enable_experimental_instrumentations (default: false). I didn't change the underlying mechanism though. The enable_experimental_instrumentations setting controls whether experimental is added or removed from the disable_instrumentations list.
I can easily revert that but I like it better that way. Especially because now you don't have to remember to append the default value of experimental when disabling a specific instrumentation (for example disable_instrumentations=experimental,executor-service). It has an impact on users that have enabled experimental plugins by setting disable_instrumentations=. I wouldn't consider that to be a breaking change though as by enabling experimental plugins is users opt-into something experimental by definition. We'll have to highlight this in the release notes though.

Also makes sure that only versions starting with 1.14.0 are instrumented

…emetry-bridge

SylvainJuge · 2022-03-03T15:48:04Z

FYI I have updated version to 1.30.0-SNAPSHOT as OTel bridge is documented for 1.30.0 version.

…emetry-bridge

eyalkoren

Epic! 👏
Mostly minor comments.
I didn't verify everything is covered properly through tests.

eyalkoren · 2022-03-08T17:29:42Z

.../apm-opentelemetry-plugin/src/main/java/co/elastic/apm/agent/opentelemetry/sdk/OTelSpan.java

+
+    public OTelSpan(AbstractSpan<?> span) {
+        this.span = span;
+        span.incrementReferences();


Is that balanced through span end/report or is that an intentional span leak so to avoid recycling (like we do with public-API-touched spans)?
If it is the latter, we should consider somehow making a distinction between library-inherent API usage and custom instrumentations, not entirely sure how.

This is an "intentional leak" to have a similar behavior like our Agent Public API where spans/transactions are not recycled.
Given the OTel API provides ways to wrap an active span into Callable or Runnable there are less opportunities for end-users to mis-use it, so maybe in the future we could optionally enable recycling.

We don't have any way to distinguish OTel API usages, so "custom instrumentation" and "library that explicitly uses it", at best we could make some heuristics by the provided fields and values but that sounds quite brittle. What would be the benefit to see a difference between the two ?

What would be the benefit to see a difference between the two ?

That inherent library instrumentations are less likely to misuse the API.
I'd like to provide the absolutely minimal overhead for such, mainly thinking on possible ES instrumentation, but not only

It's not only about misusing the API. It's valid for a span to be used as a parent after it has ended. This would set the reference count to 0, yet it can't be recycled as it may be used as a parent of another span.

I see. Then maybe we can do something even safer that will move the lifecycle management responsibility from us to the using library through the API, so that "advanced" libraries can optionally tell us when a span can be recycled.
I am not sure about the feasibility of convincing OTel to add an optional recycle()/dispose() functionality to the API, but we can use the current API, e.g. with a special "recycle" event through addEvent, or a second invocation of end.
It may be a significant advantage if we can pull off recycling for "specialist" libraries.

We thought of a solution that may be as efficient and simple to get done: creating a tiny "OTel API extension" that contains not much more than Tracer.recycle(Object) with a noop implementation that can be overridden dynamically with a concrete implementation. Then it's both efficient, robust and doesn't interfere with agent vendor.
But first thing is to measure what the actual overhead of not recycling is.

...telemetry-plugin/src/main/java/co/elastic/apm/agent/opentelemetry/sdk/OTelBridgeContext.java

eyalkoren · 2022-03-09T08:30:42Z

apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/AbstractSpan.java

+    @Nullable
+    private OTelSpanKind otelKind = null;
+
+    private final Map<String, Object> otelAttributes = new HashMap<>();
+


How come this is here and not in OTelSpan? I assume it's because of serialization and trying to follow are the guidelines:

we don't want a separate OTel branch under AbstractSpan, so we cannot extend or implement the internal types in the OTel plugin

OTelSpan wraps an AbastractSpan, so it knows about it but not vice versa

I would be happy if we can lose any OTel stuff in our internal code as much as possible.

With the attributes it's easy since it's only a Map<String, Object> so just renaming to attributes would do. For the otelKind it's also easy - since we only need its toString implementation, the field can be of type Object and renamed kind and OTelSpanKind can go into the plugin.

However, this will not accommodate future additions (like OTel events?) and it doesn't lose the OTel-specific serialization from our general-purpose serializer.

Proposal

Maybe a more generic solution is to introduce another interface to the co.elastic.apm.agent.impl.context package:

interface CustomContext extends Recyclable { serialize(JsonWriter); }

The plugin will implement it with attributes and kind and the serialize implementation would be the current DslJsonSerializer#serializeOTel implementation. It may eliminate the need to copy the attributes to an additional map.
It will be another AbstractSpan field that will be serialized in the appropriate order.

I will try to look at an alternative for this, maybe close to what you suggested.

I'm skeptical that pulling out the attributes out of AbstractSpan is possible. Think about the case where someone uses the OTel API to add a custom attribute to a span or transaction that has been created by the auto-instrumentation of the agent.
The OTelSpan is just an ephemeral bridge object where we should think twice before adding state to it.

That's right, we shouldn't add a state to OTelSpan, the AbstractSpan would still own the bridge context. It's not where the data resides, only where the implementation resides. I prefer the bridge to maintain this part of the context, which is tightly coupled with the OTel API.

This is what I propose (calling it FreeFormContext because the internal context already has room for custom context):
We add another context type that has a very loose API:

interface FreeFormContext extends Recyclable { serialize(JsonWriter); }

Anything that implements it is also responsible for proper serialization of it. Not sure about recycling, but I think that as long as we don't allow multiple types of that within the same runtime, it can be set once and then only recycled. That is only relevant if we really find a proper way to recycle these spans of course.

Then we add another context property to spans:

public abstract class AbstractSpan<T extends AbstractSpan<T>> implements Recyclable { .... public abstract AbstractContext getContext(); .... public abstract void setFreeFormContext(FreeFormContext); public abstract FreeFormContext getFreeFormContext(); }

The bridge will have a OTelSpanContext implements FreeFormContext and instead of doing:

span.getOtelAttributes().put("key", "value");

OTelSpan would do something like:

((OTelSpanContext)span.getFreeFormContext()).setAttribute("key", "value");

I hope this makes sense.

I see what you mean here, but I'm not sure that we can generically handle the JSON serialization part in an elegant way.
We could even allow the plugin to create dedicated sub-classes of Transaction and Span for that purpose.

For example, we have to assume that the serialized form will always be in the same place in JSON.
In this case, we have an extra otel attribute: span.otel object with span.otel.kind and span.otel.attributes, but any other implementation could use a different structure.

While I like the idea to keep the core of the agent as OTel-agnostic as possible, I don't think that we should handle this right now:

we only have a single implementation, thus it's probably too early to create an abstraction that will fit a future abstraction.

I don't see any other possible implementation candidate for this.

it's only a couple of fields, thus it seems an acceptable compromise, and we can always revisit this later.

but I'm not sure that we can generically handle the JSON serialization part in an elegant way.

Wouldn't a copy+paste of the current serialization work without any change?

For example, we have to assume that the serialized form will always be in the same place in JSON.

JSON relies on nesting. You can assume that you are in the right place within the "parent" node serialization because it is the one invoking you and once you are invoked, you serialize the nested tree as you know it should be.

While I like the idea to keep the core of the agent as OTel-agnostic as possible, I don't think that we should handle this right now: ...

I see it exactly the opposite 😄
Even though it seems it is going this direction, we don't yet know whether using us with OTel is going to be the absolute norm. It seemed to go there with OpenTracing before. Currently we add fields to our core to support niche cases.
I don't think about other implementations, it's not about abstraction for the purpose of polymorphism, it's abstraction for the purpose of proper responsibility separation - the bridge should know about the core, the core should not know about the bridge.

That said, I am only explaining to emphasize my rationale, it's not something I will insist on so feel free to make the call and leave as is.

...entelemetry-plugin/src/main/java/co/elastic/apm/agent/opentelemetry/sdk/OTelSpanBuilder.java

...pm-opentelemetry-test/src/test/java/co/elastic/apm/opentelemetry/OpenTelemetryVersionIT.java

eyalkoren · 2022-03-09T11:13:19Z

apm-agent-core/src/test/resources/specs/otel_bridge.feature

@@ -0,0 +1,244 @@
+@opentelemetry-bridge
+Feature: OpenTelemetry bridge


eyalkoren · 2022-03-09T11:18:24Z

...ry-plugin/src/test/java/co/elastic/apm/agent/opentelemetry/sdk/ElasticOpenTelemetryTest.java

+        otelTracer = openTelemetry.getTracer(null);
+
+        // otel spans are not recycled for now
+        disableRecyclingValidation();


OK, so this answers my question above - we leak those on purpose.
Makes sense in general.
It looks like it will be quite difficult to maintain refs properly. Paying this extra overhead for arbitrary manual usage of the API worth it, but ideally we could also have a more optimized mode for libraries that embed OTel API usage.

I think this answers my previous question on the distinction between manual instrumentation and calls to the OTel API that are already in an existing library.

I think this answers my previous question on the distinction between manual instrumentation and calls to the OTel API that are already in an existing library.

Just like happened to you, I only got here now 😄

docs/apis.asciidoc

docs/api-opentelemetry.asciidoc

…emetry-bridge

eyalkoren

We discussed the separation of responsibilities both here and offline and that's good enough for me. It's definitely not critical, only a design perspective, so I am perfectly fine with leaving it as is.

…emetry-bridge

felixbarny added 13 commits January 22, 2021 16:47

Fist working draft

27fab55

Lazily parse tracestate

517e037

Use built-in Context

605c893

Translate OTel attributes to intake API

844f097

Refine instrumentations

1a28319

Mark as experimental

204bea7

Add enable_experimental_instrumentations option

a7b65c9

polishing

c89b977

Extract advice to separate class

87e8420

more polishing

8f63bf1

Map OTel semantic convention attributes to data model

9b2e489

Mark spans non-discardable on context propagation

a7899d9

Remove construction of URL fields that are filled on APM Server

56594eb

github-actions bot added the agent-java label Jan 25, 2021

felixbarny added 2 commits January 25, 2021 09:40

Add license headers

ca4e5a6

Revert "Remove construction of URL fields that are filled on APM Server"

31c5b97

This reverts commit 56594eb.

AlexanderWert added this to the 7.13 milestone Jan 25, 2021

AlexanderWert removed this from the 7.13 milestone Jan 25, 2021

felixbarny added 3 commits January 26, 2021 10:09

Map destination details of external spans

f04c94f

Avoid calling method that's @SInCE Java 9

1d72e4a

Fix packaging and shading

316c65b

felixbarny added 5 commits January 29, 2021 11:04

Add docs

4bf7349

Add changelog

083d297

Merge remote-tracking branch 'origin/master' into opentelemetry-bridge

a9788b0

Document when OTel bridge has been added

5465c38

Update to OTel 0.15.0 and test older versions too

970e425

Also makes sure that only versions starting with 1.14.0 are instrumented

Merge branch 'main' of github.com:elastic/apm-agent-java into opentel…

eaf672a

…emetry-bridge

Merge branch 'main' of github.com:elastic/apm-agent-java into opentel…

5506146

…emetry-bridge

SylvainJuge requested a review from eyalkoren March 3, 2022 16:41

eyalkoren reviewed Mar 9, 2022

View reviewed changes

SylvainJuge added the await-release Mark issues that depend on next release, or PRs that are planned to be included label Mar 14, 2022

AlexanderWert modified the milestones: 8.0, 8.2 Mar 15, 2022

SylvainJuge added 7 commits March 15, 2022 16:03

prevent multiple root contexts

26a7d60

fix pebkc

f5cdebc

set implicit active parent only at startSpan

984d375

fix docs (attempt for menu)

4070339

fix API docs menu integration

e1a00e5

trim whitespace

72bce1f

Merge branch 'main' of github.com:elastic/apm-agent-java into opentel…

404b0ce

…emetry-bridge

SylvainJuge mentioned this pull request Mar 16, 2022

test: synchronizing gherkin spec #2516

Closed

SylvainJuge added 3 commits March 16, 2022 11:01

fix links to public-api

3493ee1

Merge branch 'main' of github.com:elastic/apm-agent-java into opentel…

89ec40f

…emetry-bridge

Merge branch 'main' of github.com:elastic/apm-agent-java into opentel…

6e63eef

…emetry-bridge

eyalkoren approved these changes Mar 17, 2022

View reviewed changes

SylvainJuge added 6 commits March 17, 2022 13:33

fix generated doc

e4b85c1

doc: try removing float blocks

72e5d24

fix API menu

eaaa292

Merge branch 'main' of github.com:elastic/apm-agent-java into opentel…

1481a3a

…emetry-bridge

fix new module version

020ed20

Merge branch 'main' into opentelemetry-bridge

c887091

SylvainJuge enabled auto-merge (squash) March 21, 2022 09:28

SylvainJuge merged commit ddb8c69 into elastic:main Mar 21, 2022

SylvainJuge removed the await-release Mark issues that depend on next release, or PRs that are planned to be included label May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry bridge #1631

OpenTelemetry bridge #1631

felixbarny commented Jan 25, 2021 •

edited by zube bot

Loading

apmmachine commented Jan 25, 2021 •

edited

Loading

Build stats

Test stats 🧪

felixbarny commented Jan 28, 2021

SylvainJuge commented Mar 3, 2022

eyalkoren left a comment

eyalkoren Mar 8, 2022

SylvainJuge Mar 15, 2022

eyalkoren Mar 15, 2022 •

edited

Loading

felixbarny Mar 15, 2022

eyalkoren Mar 16, 2022

eyalkoren Mar 17, 2022

eyalkoren Mar 9, 2022

SylvainJuge Mar 15, 2022

felixbarny Mar 15, 2022

eyalkoren Mar 16, 2022 •

edited

Loading

SylvainJuge Mar 16, 2022

eyalkoren Mar 16, 2022

eyalkoren Mar 9, 2022

eyalkoren Mar 9, 2022

SylvainJuge Mar 15, 2022

eyalkoren Mar 15, 2022

eyalkoren left a comment

		@@ -0,0 +1,244 @@
		@opentelemetry-bridge
		Feature: OpenTelemetry bridge

OpenTelemetry bridge #1631

OpenTelemetry bridge #1631

Conversation

felixbarny commented Jan 25, 2021 • edited by zube bot Loading

What does this PR do?

Checklist

apmmachine commented Jan 25, 2021 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

felixbarny commented Jan 28, 2021

SylvainJuge commented Mar 3, 2022

eyalkoren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyalkoren Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Proposal

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyalkoren Mar 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eyalkoren left a comment

Choose a reason for hiding this comment

felixbarny commented Jan 25, 2021 •

edited by zube bot

Loading

apmmachine commented Jan 25, 2021 •

edited

Loading

eyalkoren Mar 15, 2022 •

edited

Loading

eyalkoren Mar 16, 2022 •

edited

Loading