-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context propagation to client instrumentation #3825
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that's missing for me is the discussion of the mechanics of backwards propagation. In the current wording the ClientPropagator sounds nearly symmetrical to classic forward propagators, but it is very different. It is effectively a one-hop mechanism, since it is not possible to implement a multi-hop "backtrace" solution (e.g. sampling on errors) given the existing Context APIs. I don't believe the intention of this PR is to solve the latter, so I would make the one-hop nature explicit.
specification/overview.md
Outdated
OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting | ||
concern values such as `Span`s (usually only the `SpanContext` portion) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting | |
concern values such as `Span`s (usually only the `SpanContext` portion) and | |
OpenTelemetry uses `Propagators` to serialize and deserialize payloads for | |
cross-cutting applications, such as `Span`s (usually only the `SpanContext` portion) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness sake this PR should include any changes to the Trace spec as well. The Trace API uses the spanid from the propagated context as the parent of the span created further or create a root span when there is no context propagated. When a Client propagator is configured, what is the expected behavior for what used to be a root span previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I had only the "client instrumentation" case in mind when writing this. For the classic microservices case, I'm not sure what use the client propagation would be used for: if A makes a client to B and includes a trace context, does it need to know anything at all about B's context?
Or perhaps B is creating a new trace ID and you'd want it as a span link on A's call? I'm open to ideas here.
cc @yurishkuro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a valid question how a response received from the server is meant to be handled. For example, it cannot be stored in the context as that new context will immediately go out of scope. So while what's described in the PR now is reasonably abstract and stand-alone, it is not actually useful for the stated purposes without additional clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro, I can clarify that from a client perspective, but I'm interested in seeing if you have suggestions for the server-to-server scenario. Or perhaps this first version should exclusively be about client<-server, and not server<-server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be open to having this as an enhancement to this on a follow-up proposal? I feel like this needs more discussion, and that it's not entirely related to the goal of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would like to see a prototype. I do not see a working solution in this proposal, only a piece of it, the physical propagation of context upwards, but not how it can be used in the client, in which layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling I'm okay with keeping the usage of the response propagator in a separate PR but it's up to the approvers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro, can you confirm you've seen this example? Or did you mean you expect a PoC with a browser-based example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro, I can clarify that from a client perspective
I don't see see any indication or what an SDK or instrumentation should do, or can be configured to do when it encounters receives context encountered backwards. So this seems like something that could be theoretically useful without a practical way to be useful. Without some corresponding spec on what should happen when these are received, its not clear what we gain by standardizing this.
I know this is experimental and would evolve, but would be good to include some sort of hint on how this is expected to evolve. Do we imagine this SDKs or instrumentation or something else to handle these? Presumably the behavior would vary by configured ResponsePropagator on the client side. What might the receiver of W3C tracesponse
be expected to do?
I personally would find more intuitive something on the lines of |
Good call, I think it was implied in one of the paragraphs, but explicit is better than implicit. I'm careful here to not prescribe too much, leaving room for the concrete implementations. I could add one example using Server-Timing, making sure to note that it's not an example that might end up in an implementation (given how things can still move on the W3C TraceContext spec). |
Of the suggested names, I think I prefer |
@yurishkuro Would backpropagation im combination with deferred sampling qualify as a multi-hop solution, or do you have something else in mind? With deferred sampling in place and the client changing its sampling decision based on back-propagated context, sampling decisions can eventually be back-propagated through multiple hops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specification change needs stability markers (and clearly outline new sections from existing stable sections).
Additionally, I'd like to see a prototype implementation somewhere in any SDK language (branch/fork is fine) before approving.
specification/overview.md
Outdated
OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting | ||
concern values such as `Span`s (usually only the `SpanContext` portion) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness sake this PR should include any changes to the Trace spec as well. The Trace API uses the spanid from the propagated context as the parent of the span created further or create a root span when there is no context propagated. When a Client propagator is configured, what is the expected behavior for what used to be a root span previously?
I would rather go for ResponsePropagator, to emphasize one-hop, not a transitive backwards prop. |
@pyohannes yes, in theory, but there is no specification how that would work. |
7a3e4e9
to
d104bb2
Compare
Can you please take another look? I added stability markers to the new sections, but I think it's important to mention the new response propagator elsewhere. I marked the propagator as experimental there in one place but not in the overview. If you think it's not enough, let me know: I think the current state should be clear enough to determine what's experimental and what's stable, while still maintaining the readability. |
I added a paragraph about this, let me know if you think it's sufficient:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
e631ca7
to
4329fe3
Compare
I think this is ready for a final review. I added a small PoC using the Go SDK as base, showing how this could work: There's one open question, which I think we can handle as part of a follow-up discussion: what should a client do when there's already a trace context (like in a server<-server response). IMO, the incoming traceresponse should only be used to add a span link, but I'm not sure where this should be specified (if at all). |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Status update: I'm totally swamped right now, and asked for the Faro Web SDK folks here at Grafana Labs to help me with the PoC (cc @codecapitano). @johnbley , I shared your offer with them as well, I'd love to see some collaboration there :-) Since I don't have any new events on my schedule until OTel Community Day, I should be able to clear my queue by the time the PoC gets done. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@jpkrohling Client Instrumentation SIG would be a good place to take this forward. We discussed this topic in the Client SIG with @johnbley a few weeks ago and we realized a couple things -
While both have their pros and cons, it would be good to discuss how we want to take this forward. |
Based on this latest information, I'm marking this as draft. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Small update: |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Changes
This spec change adds the notion of Client Propagators, needed for the use-cases described at #3811. It's not clear to me whether this change warrants an OTEP, and I'll gladly create one if needed.
Edit: here's a prototype showing how this can be used in the Go SDK: jpkrohling/opentelemetry-go@c60efdf#diff-b10dee5922fceb25e03ff269b2665356209304f11e98d507c6bfdd2314e4b693
Related issues:
Checklist
CHANGELOG.md
file updated for non-trivial changesspec-compliance-matrix.md
updated if necessarySigned-off-by: Juraci Paixão Kröhling juraci@kroehling.de