-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FaaS: Change requirements regarding handling of AWS Lambda-provided SpanContext
(take 2)
#277
Changes from 4 commits
67fdf2a
23ed268
90a5141
828526f
327e55f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -14,7 +14,7 @@ use cases. | |||||||||
<!-- toc --> | ||||||||||
|
||||||||||
- [All triggers](#all-triggers) | ||||||||||
* [AWS X-Ray Environment Span Link](#aws-x-ray-environment-span-link) | ||||||||||
* [AWS X-Ray Environment `SpanContext`](#aws-x-ray-environment-spancontext) | ||||||||||
- [API Gateway](#api-gateway) | ||||||||||
- [SQS](#sqs) | ||||||||||
* [SQS Event](#sqs-event) | ||||||||||
|
@@ -54,21 +54,21 @@ and the [cloud resource conventions][cloud]. The following AWS Lambda-specific a | |||||||||
[faasres]: /docs/resource/faas.md (FaaS resource conventions) | ||||||||||
[cloud]: /docs/resource/cloud.md (Cloud resource conventions) | ||||||||||
|
||||||||||
### AWS X-Ray Environment Span Link | ||||||||||
### AWS X-Ray Environment `SpanContext` | ||||||||||
|
||||||||||
If the `_X_AMZN_TRACE_ID` environment variable is set, instrumentation SHOULD try to parse an | ||||||||||
OpenTelemetry `Context` out of it using the [AWS X-Ray Propagator](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/context/api-propagators.md). If the | ||||||||||
resulting `Context` is [valid](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/trace/api.md#isvalid) then a [Span Link][] SHOULD be added to the new Span's | ||||||||||
[start options](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/trace/api.md#specifying-links) with an associated attribute of `source=x-ray-env` to | ||||||||||
indicate the source of the linked span. | ||||||||||
Instrumentation MUST check if the context is valid before using it because the `_X_AMZN_TRACE_ID` environment variable can | ||||||||||
contain an incomplete trace context which indicates X-Ray isn’t enabled. The environment variable will be set and the | ||||||||||
`Context` will be valid and sampled only if AWS X-Ray has been enabled for the Lambda function. A user can | ||||||||||
disable AWS X-Ray for the function if the X-Ray Span Link is not desired. | ||||||||||
If the `_X_AMZN_TRACE_ID` environment variable is set, instrumentation MUST ensure that the value from that | ||||||||||
environment variable is included in the `Carrier` used to extract a parent `Context` with the `Field` name | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I want to point out that as currently written, this document doesn't specify what should happen to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does define the behavior. It says that the instrumentation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I'm saying that should be more explicit. It also doesn't give any option to users who wish to disable this behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
what would be the downside for this to be added as a separate key in the carrier, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I believe that I explained why that is not appropriate here. I have repeatedly said that I believe that there needs to be more user agency afforded with regard to the construction of the carrier used for context extraction. Doing so would make it irrelevant what any of us think about what users want because they would be able to do what they want without configuration options or overrides. I have avoided going that far in this PR as here I am trying to fix a broken spec in the smallest possible way so that instrumentation can start offering more flexibility to users in a way that leaves the door open for further changes in that direction. Can we all agree that this change does that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. W3C can only propagate over http headers or sqs/sns message attributes. That's fine in simple architectures as long as users haven't added many of their own attributes. AWS imposes a limit of 10 message attributes which means w3c propagation can fail if the limit is already reached. X-Ray is not subject to this limit as it propagates over message system attributes. Attributes also don't allow propagation when using S3 events or SNS->SQS type architectures (among other scenarios involving aws managed services). Tracing in these scenarios require x-ray. This was also the main motivation for open-telemetry/opentelemetry-specification#3212. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
is there any AWS (or other) doc about this, I feel like this is an important point that I'd like to understand better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here are a few links I found: The key point being this all applies only to x-ray. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At a high level - The answers to these questions should be sufficient for acceptance of this PR -
IIUC the details correctly, the answers are:
(2) Is less important given we still need to prototype and define changes to Propagation, and we could find workaround, but (1) is concerning to me. I like @trask's proposals to be more explicit here, possibly having two x-ray propagators one which uses ENV and the other which does not so users can prioritize between them if necessary, and then ideally this would work with an improved propagation framework. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @jsuereth, Now that you have context with this proposal, there is also this other proposal that tries to solve the problem with a different direction and that I believe has "yes" as answer for questions 1 and 2: #164 While I agree that @trask 's proposal would work, I would prefer to avoid it given that this propagator would be specific to lambda (i.e.: users would not be able to use this propagator elsewhere) and it would not fully comply with the propagator API, specifically:
Moreover, what would be the behaviour of the I would prefer that the lambda instrumentation prepare a valid carrier to be consumed by subsequent propagators. |
||||||||||
"X-Amzn-Trace-Id". | ||||||||||
Comment on lines
+60
to
+61
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More specifically in the context of this PR, what I’m asking about is whether it would make sense to populate a different field name here (instead of overwriting the non-vended parent), so that the choice of which one to parent can be made by the configured propagator(s).
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe that is appropriate as it would require different propagation formats to be used for extraction and injection and would involve OTel creating a new propagation format and attributing it to AWS. I believe the appropriate thing to do is to further extend the user agency offered by this specification to require that instrumentation give users control over how the carrier is constructed so that they can choose to either overlay or underlay the value from the Lambda execution environment as is appropriate for their use case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
this makes sense to me I think it brings us back to the question of default behavior, which my thought is that the default behavior should be the one which doesn't parent the vended span, since the vended span is specific to the X-Ray backend There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure it's a question of default behavior at that point. I'm not aware of any SDK that configures the default global propagator to be or include the AWS X-Ray propagator outside of the ADOT Java agent. To the extent that most SDK implementations that set a default global propagator use a composite including There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with extending user agency to pick which parent (or link) they want, but my understanding of the TC's "lean into propagators" meant that this would be better to do via propagators (and maybe propagator configuration) as opposed to via instrumentation configuration There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we're in agreement here. I might add only that instrumentation configuration should include propagator selection as there may be cases where specific instrumentation should be configured to use a different propagator, but that is orthogonal to how the instrumentation constructs the carrier to pass to whatever propagator it does use, which is the focus of this PR. |
||||||||||
|
||||||||||
**Note**: When instrumenting a Java AWS Lambda, instrumentation SHOULD first try to parse an OpenTelemetry `Context` out of the system property `com.amazonaws.xray.traceHeader` using the [AWS X-Ray Propagator](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/context/api-propagators.md) before checking and attempting to parse the environment variable above. | ||||||||||
Instrumentation MUST provide users with the option to create a `Link` to the AWS Lambda vended spans. If this | ||||||||||
Aneurysm9 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
option is enabled then the instrumentation MUST attempt to extract a `SpanContext` from a `Carrier` including | ||||||||||
only the value of the `_X_AMZN_TRACE_ID` environment variable in a `Field` with the name "X-Amzn-Trace-Id". If | ||||||||||
a valid `SpanContext` is extracted then it MUST be included as a `Link` in the initial span created by the | ||||||||||
instrumentation. The `Link` MUST have an associated `String`-typed | ||||||||||
attribute with a key of `source` and a value of `x-ray-env`, in order to indicate the source of the linked span. | ||||||||||
|
||||||||||
[Span Link]: https://opentelemetry.io/docs/concepts/signals/traces/#span-links | ||||||||||
**Note**: When instrumenting a Java AWS Lambda function, instrumentation SHOULD first try to obtain a value | ||||||||||
from the system property `com.amazonaws.xray.traceHeader` before checking the `_X_AMZN_TRACE_ID` environment variable. | ||||||||||
|
||||||||||
## API Gateway | ||||||||||
|
||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this actually needs prototyping in languages-not-Go. I like the direction of this (pushing precedence configuration to propagation config), but I'm not sure our current implementations will allow this.
WDYT @trask @dyladan for Java / JavaScript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work. The instrumentation has control over both the carrier and the function that reads fields from that carrier (so it could always create a synthetic carrier / accessor function). @rapphil would you or someone at AWS be able to send a Java draft PR to prove this out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @trask I had previously created this POC rapphil/opentelemetry-java-instrumentation#1 for this proposal.
I will update it and share the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a prototype but can't imagine that this would be difficult in JS. We have full control of the carrier and it is just a plain javascript object. There might be an additional allocation to ensure the original header object is not mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you're ready, go ahead and send it as a draft PR so we can review / add comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask Here is the POC: open-telemetry/opentelemetry-java-instrumentation#9300