RFC for "zero-touch telemetry" requirements #5

bhs · 2019-07-01T19:05:53Z

This RFC is a light edit of "Zero-Code-Modification OpenTelemetry Instrumentation: (i.e., "Agent") A Cross-Language Spec" (a google doc). Also see open-telemetry/community#87.

bhs · 2019-07-01T20:51:12Z

sigh... I got a new laptop last weekend and need to fix my Git config (re the CLA failure). Will do...

time passes...

done.

AloisReitbauer · 2019-07-02T05:46:33Z

As mentioned on Gitter I support the activity if it covers autoinjection and auto instrumentation.

AloisReitbauer · 2019-07-02T05:47:03Z

0002-telemetry-without-manual-instrumentation.md

@@ -0,0 +1,57 @@
+# (Open) Telemetry Without Manual Instrumentation
+
+_Cross-language requirements for automated approaches to extracting portable telemetry data with zero source code modification._


can we also add injection of tracers?

@AloisReitbauer can you be more specific about the requirement here? I can interpret "injection of tracers" in various ways and don't want to misunderstand/misrepresent.

See comment below on specifying the trace at program startup and OpenTel being able to use this parameter to set - and load - the trace at runtime.

0002-telemetry-without-manual-instrumentation.md

AloisReitbauer · 2019-07-02T05:51:16Z

0002-telemetry-without-manual-instrumentation.md

+* Licensing must be permissive (e.g., ASL / BSD)
+* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
+    * That is, vendors do not want to require users to comprehend both an OpenTel package and a vendor-specific package
+* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.


This is the far biggest issue right for automatic instrumentation. Unfortunately, it is a bit more complicated than just managing spans. The biggest problem we have is double instrumentation i.e. the same action is instrumented with blackbox and whitebox instrumentation - as you call it.

Detection and disabling is one thing, but maybe one instrumentation augments the information of another. How do you want to handle these situations

I was not planning to make augmentation a requirement for our first take on this problem. I'll add it as a nice-to-have, though, as it certainly would be... well, nice to have.

Fine with me. I think we need to get started on this. As we see more real-world use cases of double instrumentation we will also better understand how to solve the issue. Currently, there is quite some research going on at Dynatrace to figure out how to make this work best.

Components in OpenTelemetry might already solve a big part of the problem.

Yeah, that all makes perfect sense and I agree wholeheartedly with

Components in OpenTelemetry might already solve a big part of the problem.

"Components" (or "plugins" or whatever we call them) are a crucial abstraction, especially when mixing and matching various forms of instrumentation.

AloisReitbauer · 2019-07-02T05:53:03Z

0002-telemetry-without-manual-instrumentation.md

+
+### Nice-to-have properties
+
+* Run-time integration (vs compile-time integration)


For us this is a must have

I put this here for two reasons:

Some other folks on the original GitHub issue (or maybe it was the google doc?) were suggesting we drop the requirement to make v1 easier to build, and

More fundamentally, in languages like Go it's unclear how this would work without forking the compiler. So it felt odd listing this as a requirement

Certainly runtime integration is preferable, all other things being equal.

We are discussing this with the Go team right now. Happy to bring them in.

I also believe it, there would be an easy way for dynamic tracer loading using an environment variable for the tracer to load and the OpenTel library to load this tracer. We need to define how situations are handled when a tracer is compiled in.

Generally, I would propose to only compile the code with a noop tracer and provider the actual tracer to use as a parameter.

Going a bit deep into the weeds here, but I want to clarify my point and what I would like this group to look at.

Helpful, thanks for clarifying. Speaking only for myself, I feel uneasy mandating that Tracers are loaded via environment variables only, though I certainly acknowledge the benefits of structuring things that way in certain scenarios.

Re this point:

We are discussing this with the Go team right now. Happy to bring them in.

I think that might be premature, at least for this high-level multi-language RFC. For Go specifically, it would be fantastic to get them involved (and I imagine the various Googlers involved with the project can help with that, too).

I did not mean only, but there should be a well-defined way to load via environment variables.

AloisReitbauer · 2019-07-02T05:53:57Z

0002-telemetry-without-manual-instrumentation.md

+
+## Trade-offs and mitigations
+
+Approaching a problem this language-specific at the cross-language altitude is intrinsically challenging since "different languages are different" – e.g., in Go there is no way to perform the kind of runtime interpositioning that's possible in Python, Ruby, or even Java.


Not entirely true. Dynatrace has runtime instrumentation for Go :-)

Not a dig on Dynatrace, but I'm sure it doesn't work as well or as flexibly as it does in Java... so I would stand by the "that's possible in" bit here.

No offense was taken :-). I believe we agree on the problem. We - meaning all tracing tool providers - need to invest a lot of magic to make this work. We often use undocumented features or even (security) loopholes of platforms to make this work.

0002-telemetry-without-manual-instrumentation.md

reyang · 2019-07-02T17:54:14Z

0002-telemetry-without-manual-instrumentation.md

+* Licensing must be permissive (e.g., ASL / BSD)
+* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
+    * That is, vendors do not want to require users to comprehend both an OpenTel package and a vendor-specific package
+* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.


Regarding interop, it'll be helpful to clarify the versioning/compatibility story open-telemetry/opentelemetry-specification#115.

@reyang no argument on this at all... open-telemetry/opentelemetry-specification#115 is unresolved at the moment, so difficult to incorporate here, but certainly the versioning story (and "component"-ization story as @AloisReitbauer has brought up here) is/are a big part of the challenge for interoperable whitebox+blackbox instrumentation.

reyang · 2019-07-02T17:58:14Z

0002-telemetry-without-manual-instrumentation.md

+* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
+    * That is, vendors do not want to require users to comprehend both an OpenTel package and a vendor-specific package
+* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.
+    * If the blackbox instrumentation starts a Span, whitebox instrumentation must be able to discover it as the active Span (and vice versa)


If the whitebox instrumentation has multiple tracers, how to define the "active span"?

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/tracing-api.md#getcurrentspan

Probably we need a way for the blackbox instrumentation to discover the tracers?

@reyang I don't know what you mean by "multiple traces" – do you mean "multiple tracers"? If so, I was of the understanding (feel free to correct me 😉) that the "multiple tracer" scenario is more of a "tracer multiplexer" where there would still be only a single active span.

Probably we need a way for the blackbox instrumentation to discover the tracers?

Almost certainly, yes, though IMO that's "an implementation detail," albeit an important one. I.e., I wouldn't call it a "requirement", just something that we'll need to do in order to satisfy the requirements (if that distinction makes sense).

Got it, thanks. (and you're right, I've made a typo, it is "multiple tracers", I've updated the comment.

reyang · 2019-07-02T18:00:04Z

0002-telemetry-without-manual-instrumentation.md

+* Explicit, whitebox OpenTelemetry instrumentation must interoperate with the “automatic” / zero-source-code-modification / blackbox instrumentation.
+    * If the blackbox instrumentation starts a Span, whitebox instrumentation must be able to discover it as the active Span (and vice versa)
+    * Relatedly, there also must be a way to discover and avoid potential conflicts/overlap/redundancy between explicit whitebox instrumentation and blackbox instrumentation of the same libraries/packages
+        * That is, if a developer has already added the “official” OpenTelemetry plugin for, say, gRPC, then when the blackbox instrumentation effort adds gRPC support, it should *not* “double-instrument” it and create a mess of extra spans/etc


What if the developer took whitebox approach and configured a specific exporter/propagator? Should the blackbox instrumentation modify/tweak it?

That's what is most likely going to happen

Maybe this is naive of me, but my hope (??) was that both the blackbox and whitebox OpenTelemetry code would be portable and non-vendor-specific. I.e., that the choice of exporter/propagator should not be relevant here.

Ideally - yes. Our initial research of double instrumentation and potential issues indicates that this is anyways a problem that needs to be solved in backend analytics.

lizthegrey · 2019-07-09T16:20:30Z

Hi, I'm here to say on behalf of @honeycombio that we're interested in contributing the Beelines (e.g. for NodeJS Express) to OpenTelemetry, which allow for the most common language frameworks automatic hooking. It's not quite "zero touch" but it's also "add one library dependency" rather than add manual spans everywhere.

Is this something you'd like me to file a separate RFC for, or should we also discuss it here?

bhs · 2019-07-11T00:07:22Z

@lizthegrey this is great to hear!

Is there a technical writeup (can just be bullets / etc, nothing formal) describing the design goals for Beelines? Would be particularly interested to understand how they would translate to a Span-oriented worldview.

I think the various conversations would be easier to navigate if that was its own thread (separate from this RFC, I mean), but that's just me.

bhs · 2019-07-11T00:10:32Z

Hi folks... I'm inclined to merge this in the next 48h if there are no more blocking concerns.

To be clear, this RFC can continue to evolve and be PR'd against. I just don't want to leave it in this pending state forever.

(cc @AloisReitbauer @reyang @lizthegrey who've all made comments)

lizthegrey · 2019-07-11T18:14:12Z

@lizthegrey this is great to hear!

Is there a technical writeup (can just be bullets / etc, nothing formal) describing the design goals for Beelines? Would be particularly interested to understand how they would translate to a Span-oriented worldview.

I think the various conversations would be easier to navigate if that was its own thread (separate from this RFC, I mean), but that's just me.

No strong objections to the existing agent/bytecode approach, and looking forward to see how it evolves... I'll start a separate PR for talking about add one code snippet as opposed to zero-touch integration :)

AloisReitbauer · 2019-07-11T18:20:20Z

I already wanted to ask when we are going to merge. I am fine with it.

bhs · 2019-07-11T18:27:44Z

Yeah, I have a reminder set to merge tomorrow AM (PT) barring any additional concerns.

lizthegrey · 2019-07-11T18:41:08Z

#7 -- here's RFC 0003 for people interested in the alternate and complementary approach to instrumentation we propose :)

reyang

LGTM, thanks for the great work!

yurishkuro · 2019-07-12T17:37:41Z

0002-telemetry-without-manual-instrumentation.md

+### Requirements
+
+Without further ado, here are a set of requirements for “official” OpenTelemetry efforts to accomplish zero-source-code-modification instrumentation (i.e., “OpenTelemetry agents”) in any given language:
+* No _manual_ source code modifications allowed


I think this should be relaxed a bit. In some languages it may not be possible to attach instrumentation without some minimal O(1) initialization code.

I'll loosen it a bit. For languages (e.g., Java) where there is a lot of prior art for truly zero-touch instrumentation, I think that should still be our goal, though.

... There are certainly a lot of legacy JARs out there that can only be instrumented from the outside in.

I'm going to merge this, but will happily debate this point further in subsequent PRs to the RFC itself.

mwear · 2019-07-17T01:18:52Z

I wrote up a proposal of how auto-instrumentation could work for OpenTelemetry Ruby here. It covers the distribution, detection and installation of instrumentation, and while it's pretty specific to Ruby, some of the ideas might be applicable to other languages.

bhs requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, reyang, SergeyKanzhelev, songy23, tedsuo and yurishkuro as code owners July 1, 2019 19:05

yurishkuro approved these changes Jul 1, 2019

View reviewed changes

RFC for "zero-touch telemetry" requirements

93875a5

iredelmeier approved these changes Jul 2, 2019

View reviewed changes

AloisReitbauer reviewed Jul 2, 2019

View reviewed changes

reyang reviewed Jul 2, 2019

View reviewed changes

0002-telemetry-without-manual-instrumentation.md Outdated Show resolved Hide resolved

Address review comments

c6e1e20

reyang reviewed Jul 2, 2019

View reviewed changes

AloisReitbauer approved these changes Jul 11, 2019

View reviewed changes

reyang approved these changes Jul 11, 2019

View reviewed changes

jmacd mentioned this pull request Jul 11, 2019

Loading the Go SDK open-telemetry/opentelemetry-go#52

Closed

yurishkuro mentioned this pull request Jul 11, 2019

RFC for "greybox" instrumentation that's magical but not manual #7

Closed

lizthegrey mentioned this pull request Jul 11, 2019

Propose an "Auto-Instrumentation SIG" open-telemetry/community#87

Closed

yurishkuro reviewed Jul 12, 2019

View reviewed changes

Slightly loosen the source code modification req.

74c8f90

bhs merged commit 1b1914d into open-telemetry:master Jul 13, 2019

iredelmeier referenced this pull request in iredelmeier/oteps Jul 26, 2019

Remove links from section titles (#5)

47a02e0

		@@ -0,0 +1,57 @@
		# (Open) Telemetry Without Manual Instrumentation

		_Cross-language requirements for automated approaches to extracting portable telemetry data with zero source code modification._


		### Nice-to-have properties

		* Run-time integration (vs compile-time integration)


		## Trade-offs and mitigations

		Approaching a problem this language-specific at the cross-language altitude is intrinsically challenging since "different languages are different" – e.g., in Go there is no way to perform the kind of runtime interpositioning that's possible in Python, Ruby, or even Java.

RFC for "zero-touch telemetry" requirements #5

RFC for "zero-touch telemetry" requirements #5

Conversation

bhs commented Jul 1, 2019

bhs commented Jul 1, 2019 • edited Loading

AloisReitbauer commented Jul 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyang Jul 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lizthegrey commented Jul 9, 2019

bhs commented Jul 11, 2019

bhs commented Jul 11, 2019

lizthegrey commented Jul 11, 2019

AloisReitbauer commented Jul 11, 2019

bhs commented Jul 11, 2019

lizthegrey commented Jul 11, 2019

reyang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwear commented Jul 17, 2019

bhs commented Jul 1, 2019 •

edited

Loading

reyang Jul 2, 2019 •

edited

Loading