Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

RFC for "greybox" instrumentation that's magical but not manual #7

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions 0003-automatic-telemetry-libraries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Automatic language/framework instrumentation in OpenTelemetry

_Cross-language requirements for automatically extracting portable telemetry data with minimal ("greybox") source code modification._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very on board with minimal, "greybox" implementation. However, I think using the word "automatic" is confusing: it implies zero-touch, rather than low-touch.


## Motivation

The purpose of OpenTelemetry is to make robust, portable telemetry a built-in feature of cloud-native software. The need to manually instrument services and wrap functions and request handlers results in friction for developers writing instrumentation. Thus, there may be a wide array of solutions for each language and each framework in the long term, but right now there's very little automatic framework instrumentation to make OpenTelemetry adoption "paste one import and add one line of code into the top of your main function".

### Why “cross-language/framework”?

There should be a _consistent_ way of adding and interacting with automatically created OpenTelemetry spans and metrics that is neither surprising to users of frameworks or languages. It should be easy for framework authors, as well as motivated users, to write automated instrumentation adapters for any framework that have similar installation methods (appropriate for the language), interoperate with other forms of instrumentation such as blackbox and whitebox, and are easy to maintain.

### Suggested reading

* https://docs.honeycomb.io/getting-data-in/beelines/ for the general philosophy on what Honeycomb would like to contribute and standardize.

Go:

* https://docs.honeycomb.io/getting-data-in/go/beeline/#wrappers-and-other-middleware which provides sets of wrappers to automatically instrument each handler and database call.
* https://github.com/open-telemetry/opentelemetry-go/blob/master/example/http/server/server.go which currently requires manually instrumenting _each_ handler.

Ruby:

* https://docs.honeycomb.io/getting-data-in/ruby/beeline/#instrumented-packages
* https://github.com/open-telemetry/opentelemetry-ruby (an empty repo)

NodeJS:

* https://docs.honeycomb.io/getting-data-in/javascript/beeline-nodejs/#instrumented-packages
* https://github.com/open-telemetry/opentelemetry-js (no automatic framework instrumentation)

## Proposed guidelines

### Requirements

Without further ado, here are a set of requirements for “official” OpenTelemetry efforts to accomplish greybox minimal-source-code-modification instrumentation (i.e., “OpenTelemetry framework adapters”) in any given language:
* No more than 50 lines of _manual_ source code modifications allowed regardless of the number of handlers/frameworks in use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering about how to frame this in a way that's not about number of lines of code, which feels like a proxy that may become overly restrictive - I can see some weird code reviews coming up in the future.

* Licensing must be permissive (e.g., ASL / BSD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Licensing must be permissive (e.g., ASL / BSD)
* Licensing must follow CNCF licensing rules (i.e., ASL / BSD)

I believe that CNCF only allows ASL and BSD?

* Packaging must allow vendors to “wrap” or repackage the portable (OpenTelemetry) library into a single asset that’s delivered to customers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For languages where zero-touch (#5) is feasible, zero-touch should probably be a wrapper of the low-touch libraries. Thoughts on whether it would make sense to support a canonical, OpenTelemetry wrapper, which is what vendors are then recommended to support? (I might also be misunderstanding this point!)

* That is, vendors do not want to require users to comprehend both an OpenTelemetry package and a vendor-specific package
* "Greybox" OpenTelemetry framework adapters must interoperate with both explicit, whitebox OpenTelemetry instrumentation and the “automatic” / zero-source-code-modification / blackbox instrumentation proposed in RFC 0002.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this seems reversed to me - i.e., it feels like zero-touch should be built on top of low-touch, rather than low-touch on top of zero-touch.

* If the greybox instrumentation starts a Span, whitebox and blackbox instrumentation must be able to discover it as the active Span (and vice versa)
* Relatedly, there also must be a way to discover and avoid potential conflicts/overlap/redundancy between explicit whitebox instrumentation and greybox/blackbox instrumentation of the same libraries/packages
* That is, if a developer has already added the “official” greybox OpenTelemetry plugin for, say, gRPC, then when the blackbox instrumentation effort adds gRPC support, it should *not* “double-instrument” it and create a mess of extra spans/etc

* The code in the OpenTelemetry package must not take a hard dependency on any particular vendor/vendors (that sort of functionality should work via a plugin or registry mechanism)
* Further, the code in the OpenTelemetry package must be isolated to avoid possible conflicts with the host application (e.g., shading in Java, etc)


### Nice-to-have properties

* Automated and modular testing of individual library/package plugins
* Note that this also makes it easy to test against multiple different versions of any given library
* A fully pluggable architecture, where plugins can be registered at runtime without requiring changes to the central repo at github.com/open-telemetry
* Augmentation of greybox instrumentation by whitebox and blackbox instrumentation (or, perhaps, vice versa). That is, not only can the trace context be shared by these different flavors of instrumentation, but even things like in-flight Span objects can be shared and co-modified (e.g., to use runtime interposition to grab local variables and attach them to a manually-instrumented span).


## Trade-offs and mitigations

to be discussed!

## Prior art and alternatives

Honeycomb's beelines, which we propose to standardize.

Blackbox instrumentation (copied from 0002): There are many proprietary APM language agents – no need to list them all here. The Datadog APM "language agents" are notable in that they were conceived and written post-OpenTracing and thus have been built to interoperate with same. There are a number of mature JVM language agents that are pure OSS (e.g., [Glowroot](https://glowroot.org/)).