Proposal for modeling nested CLIENT instrumentation #1822

trask · 2020-12-03T04:20:00Z

Since CLIENT spans cannot have nested CLIENT spans, we suppress the nested http client instrumentation, which seems reasonable, as user is interacting with the higher level library, and is probably most interested in seeing things at that level.

But sometimes the nested instrumentation can provide interesting telemetry.

For example, Elasticsearch CLIENT spans are modeled as database spans, but makes HTTP calls under the covers using an instrumented http client, and it may be interesting to see the lower level HTTP telemetry.

I'm not sure if Elasticsearch CLIENT spans are always 1-1 with HTTP calls, but it would be reasonable for some database CLIENT spans to make multiple HTTP calls under the covers.

As we discussed in SIG meeting yesterday, if we want to capture the lower level telemetry, a nice option would be to use events to represent the lower level HTTP calls.

iNikem · 2020-12-03T09:31:57Z

Should we formalise these events on spec level?

johnbley · 2020-12-04T16:20:16Z

This "layering" or "nesting" views of protocols occurs quite a bit and will occur more and more as we add more instrumentation. A few thoughts that will probably only serve to confuse matters more:

Depth: It might occur more than two layers deep: consider what happens if we add Socket instrumentation
to time the connect sequence. Database API -> Http library -> Socket connection
Necessity: Each layer has interesting observability information to add. Consider questions like "did this
high-level API call result in a new network connection or did it use one from a pool?" or "did this operation connect
to the /v2 endpoint or the /v1?". We need data from all layers to answer these sorts of questions.
Naming: Typically the best (most meaningful / best semantics / appropriate cardinality) names will come from the
highest level, though allowing multiple "client" spans (not necessarily kind=client) allows each level to express names
as it sees them and UI/backends can coalesce/combine as they see fit. Flattening to a single span would complicate matters.
Propagation and parentage: are very tricky. In the database api -> http library -> socket chain, only the http
instrumentation might know how to propagate in a way that will work. In a message queue API -> http transport chain injecting into the message metadata is critical since that is what will be persisted/carried through the queueing system, but an "instantaneous" view of "why did this queue insert take so long?" cares about the specific HTTP interaction for the post, and so we might also want to propagate context on the http headers. What parentSpanId is propagated? A notion of "propagation family/protocol" should be developed so that it is possible to encode rules such as "if another http instrumentation has already injected on this request, do not inject again" while allowing different propagation layers/protocols to coexist.
Data structures: I recommend we pursue some form of combining/allowing all layers of instrumentation to co-exist
and report their data. "Flattening" lower spans into the "first"/highest client span seems like a reasonable step, but it might get crowded and difficult - see "naming" above but also consider how to flatten attributes (colliding keys) and events (ditto). Additionally, it creates a "this instrumentation thinks it is creating a span but is actually writing to a view that materializes in another span" source of complexity in the codebase. If allowing nestedkind=client spans is not an option, then perhaps (a) have newer/lower client spans "reach up" and modify the kind of parent spans so only the lowest one is kind=client or (b) Delay marking kind until a span is ending, at which point it can know if another client span was created below it, or (c) allow kind=INTERNAL under kind=CLIENT?
Modeling: Consider again the message queue -> http chain and that semantically these are in fact two distinct operation - the first is PRODUCER and the second is CLIENT. In a previous product we allowed distinct layering and the equivalent of multiple kind=CLIENT stacked on top of each other. This led to some difficulties and weird recursive matching algorithms for linking views/models of data (e.g., client has instrumentation for SOAP, HTTP, and TCP layers, but server receiving the request only has HTTP), but it was manageable and the data was clear and complete on an agent level.

Finally, I'll say that any form of modeling/representation that isn't "let's pretend this lower level of visibility never happened" would be an improvement, and so I'll support any movement in that direction.

jkwatson · 2021-01-21T16:09:35Z

I don't think that events are going to solve this problem completely, as I think @johnbley is also suggesting. Events will absolutely be the appropriate way to tackle some issues (like looking up connections from a connection pool), but not everything.

I think we'll need to tackle this at the spec level to truly solve the issue.

Options I see:

Allow INTERNAL spans below CLIENT
Allow sub-spans to update the kind of their parent (kind is not currently writable after a span has been started)
Introduce a new span kind that can live below CLIENT

trask · 2021-05-15T23:50:31Z

Linking in a good and relevant comment from @agoallikmaa #2923 (comment)

jkwatson · 2021-06-16T18:11:24Z

Note, the spec has now been updated to allow nested CLIENT instrumentation, so this issue probably needs some re-thought.

iNikem · 2021-09-20T15:58:54Z

@open-telemetry/java-approvers I propose to close this issue as solved by https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/suppressing-instrumentation.md#enable-instrumentation-suppression-by-type

trask · 2021-09-21T00:16:20Z

👍 additional proposals/discussions can be opened as new issues

trask added the enhancement New feature or request label Dec 3, 2020

trask mentioned this issue Dec 3, 2020

Use Context more in HttpClientTracer #1811

Merged

jkwatson mentioned this issue Jan 21, 2021

The need for more granularity/clarity in CLIENT span conventions open-telemetry/opentelemetry-specification#1360

Closed

mateuszrzeszutek added the instrumentation api label Mar 5, 2021

trask mentioned this issue Apr 3, 2021

Instrumentation shouldn't force suppress new traces #1155

Closed

This was referenced Jun 10, 2021

Define span modelling conventions for HTTP client spans open-telemetry/opentelemetry-specification#1747

Closed

Prevent duplicate telemetry when using both library and auto instrumentation #903

Open

trask closed this as completed Sep 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for modeling nested CLIENT instrumentation #1822

Proposal for modeling nested CLIENT instrumentation #1822

trask commented Dec 3, 2020

iNikem commented Dec 3, 2020

johnbley commented Dec 4, 2020 •

edited

Loading

jkwatson commented Jan 21, 2021

trask commented May 15, 2021

jkwatson commented Jun 16, 2021

iNikem commented Sep 20, 2021

trask commented Sep 21, 2021

Proposal for modeling nested CLIENT instrumentation #1822

Proposal for modeling nested CLIENT instrumentation #1822

Comments

trask commented Dec 3, 2020

iNikem commented Dec 3, 2020

johnbley commented Dec 4, 2020 • edited Loading

jkwatson commented Jan 21, 2021

trask commented May 15, 2021

jkwatson commented Jun 16, 2021

iNikem commented Sep 20, 2021

trask commented Sep 21, 2021

johnbley commented Dec 4, 2020 •

edited

Loading