Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FaaS: Change requirements regarding handling of AWS Lambda-provided SpanContext (take 2) #277

Closed
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ release.
([#60](https://github.com/open-telemetry/semantic-conventions/pull/60))
- BREAKING: Remove pluralization from JVM metric namespaces.
([#252](https://github.com/open-telemetry/semantic-conventions/pull/252))
- FaaS: Change requirements regarding handling of AWS Lambda-provided `SpanContext`
([#277](https://github.com/open-telemetry/semantic-conventions/pull/277))
- Simplify HTTP metric briefs.
([#276](https://github.com/open-telemetry/semantic-conventions/pull/276))

Expand Down
26 changes: 13 additions & 13 deletions docs/faas/aws-lambda.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ use cases.
<!-- toc -->

- [All triggers](#all-triggers)
* [AWS X-Ray Environment Span Link](#aws-x-ray-environment-span-link)
* [AWS X-Ray Environment `SpanContext`](#aws-x-ray-environment-spancontext)
- [API Gateway](#api-gateway)
- [SQS](#sqs)
* [SQS Event](#sqs-event)
Expand Down Expand Up @@ -54,21 +54,21 @@ and the [cloud resource conventions][cloud]. The following AWS Lambda-specific a
[faasres]: /docs/resource/faas.md (FaaS resource conventions)
[cloud]: /docs/resource/cloud.md (Cloud resource conventions)

### AWS X-Ray Environment Span Link
### AWS X-Ray Environment `SpanContext`

If the `_X_AMZN_TRACE_ID` environment variable is set, instrumentation SHOULD try to parse an
OpenTelemetry `Context` out of it using the [AWS X-Ray Propagator](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/context/api-propagators.md). If the
resulting `Context` is [valid](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/trace/api.md#isvalid) then a [Span Link][] SHOULD be added to the new Span's
[start options](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/trace/api.md#specifying-links) with an associated attribute of `source=x-ray-env` to
indicate the source of the linked span.
Instrumentation MUST check if the context is valid before using it because the `_X_AMZN_TRACE_ID` environment variable can
contain an incomplete trace context which indicates X-Ray isn’t enabled. The environment variable will be set and the
`Context` will be valid and sampled only if AWS X-Ray has been enabled for the Lambda function. A user can
disable AWS X-Ray for the function if the X-Ray Span Link is not desired.
If the `_X_AMZN_TRACE_ID` environment variable is set, instrumentation MUST ensure that the value from that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually needs prototyping in languages-not-Go. I like the direction of this (pushing precedence configuration to propagation config), but I'm not sure our current implementations will allow this.

WDYT @trask @dyladan for Java / JavaScript?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work. The instrumentation has control over both the carrier and the function that reads fields from that carrier (so it could always create a synthetic carrier / accessor function). @rapphil would you or someone at AWS be able to send a Java draft PR to prove this out?

Copy link
Contributor

@rapphil rapphil Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @trask I had previously created this POC rapphil/opentelemetry-java-instrumentation#1 for this proposal.

I will update it and share the details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a prototype but can't imagine that this would be difficult in JS. We have full control of the carrier and it is just a plain javascript object. There might be an additional allocation to ensure the original header object is not mutated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @trask I had previously created this POC rapphil/opentelemetry-java-instrumentation#1 for this proposal.

I will update it and share the details.

when you're ready, go ahead and send it as a draft PR so we can review / add comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment variable is included in the `Carrier` used to extract a parent `Context` with the `Field` name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to point out that as currently written, this document doesn't specify what should happen to X-Amzn-Trace-Id if it already exists in the carrier before the environment variable value is applied to it. I think that's an important behavior to define.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does define the behavior. It says that the instrumentation MUST ensure that value is in the carrier at that field name. The only way to do that it to set that value. Setting that value will override any value that may have existed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm saying that should be more explicit. It also doesn't give any option to users who wish to disable this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting that value will override any value that may have existed.

what would be the downside for this to be added as a separate key in the carrier, e.g. x-amzn-env-trace-id or x-amzn-vended-trace-id (instead of overriding the existing carrier value), this way the propagator chain can have access to all of the different propagated trace contexts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting that value will override any value that may have existed.

what would be the downside for this to be added as a separate key in the carrier, e.g. x-amzn-env-trace-id or x-amzn-vended-trace-id (instead of overriding the existing carrier value), this way the propagator chain can have access to all of the different propagated trace contexts?

I believe that I explained why that is not appropriate here.

I have repeatedly said that I believe that there needs to be more user agency afforded with regard to the construction of the carrier used for context extraction. Doing so would make it irrelevant what any of us think about what users want because they would be able to do what they want without configuration options or overrides. I have avoided going that far in this PR as here I am trying to fix a broken spec in the smallest possible way so that instrumentation can start offering more flexibility to users in a way that leaves the door open for further changes in that direction. Can we all agree that this change does that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W3C can only propagate over http headers or sqs/sns message attributes. That's fine in simple architectures as long as users haven't added many of their own attributes. AWS imposes a limit of 10 message attributes which means w3c propagation can fail if the limit is already reached. X-Ray is not subject to this limit as it propagates over message system attributes.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html#sqs-message-system-attributes

Attributes also don't allow propagation when using S3 events or SNS->SQS type architectures (among other scenarios involving aws managed services). Tracing in these scenarios require x-ray. This was also the main motivation for open-telemetry/opentelemetry-specification#3212.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes also don't allow propagation when using S3 events or SNS->SQS type architectures (among other scenarios involving aws managed services). Tracing in these scenarios require x-ray.

is there any AWS (or other) doc about this, I feel like this is an important point that I'd like to understand better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level - The answers to these questions should be sufficient for acceptance of this PR -

  1. Does this give users the ability to control parenting choice via Propagator config?
  2. Does any of the normative language prevent moving to linking as the default behavior for Producer/Consumer message passing? Does it limit updating Propagator specification to allow this?

IIUC the details correctly, the answers are:

  1. No. Users will not have the ability to configure whether ENV variable or attribute is parented in this solution. By fiat, the ENV will always win.
  2. Yes. The x-ray header for SQS is passed as an attribute. Because the ENV variable would override this, we'd no longer see the alternative parent in the carrier sent to Propgators.

(2) Is less important given we still need to prototype and define changes to Propagation, and we could find workaround, but (1) is concerning to me.

I like @trask's proposals to be more explicit here, possibly having two x-ray propagators one which uses ENV and the other which does not so users can prioritize between them if necessary, and then ideally this would work with an improved propagation framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jsuereth,

Now that you have context with this proposal, there is also this other proposal that tries to solve the problem with a different direction and that I believe has "yes" as answer for questions 1 and 2: #164

While I agree that @trask 's proposal would work, I would prefer to avoid it given that this propagator would be specific to lambda (i.e.: users would not be able to use this propagator elsewhere) and it would not fully comply with the propagator API, specifically:

If a value can not be parsed from the carrier, for a cross-cutting concern, the implementation MUST NOT throw an exception and MUST NOT store a new value in the Context, in order to preserve any previously existing valid value.

Moreover, what would be the behaviour of the inject operation of this propagator?

I would prefer that the lambda instrumentation prepare a valid carrier to be consumed by subsequent propagators.

"X-Amzn-Trace-Id".
Comment on lines +60 to +61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically in the context of this PR, what I’m asking about is whether it would make sense to populate a different field name here (instead of overwriting the non-vended parent), so that the choice of which one to parent can be made by the configured propagator(s).

Suggested change
environment variable is included in the `Carrier` used to extract a parent `Context` with the `Field` name
"X-Amzn-Trace-Id".
environment variable is included in the `Carrier` used to extract a parent `Context` with the `Field` name
"X-Amzn-Vended-Trace-Id".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that is appropriate as it would require different propagation formats to be used for extraction and injection and would involve OTel creating a new propagation format and attributing it to AWS. I believe the appropriate thing to do is to further extend the user agency offered by this specification to require that instrumentation give users control over how the carrier is constructed so that they can choose to either overlay or underlay the value from the Lambda execution environment as is appropriate for their use case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require that instrumentation give users control over how the carrier is constructed so that they can choose to either overlay or underlay the value from the Lambda execution environment as is appropriate for their use case

this makes sense to me

I think it brings us back to the question of default behavior, which my thought is that the default behavior should be the one which doesn't parent the vended span, since the vended span is specific to the X-Ray backend

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a question of default behavior at that point. I'm not aware of any SDK that configures the default global propagator to be or include the AWS X-Ray propagator outside of the ADOT Java agent. To the extent that most SDK implementations that set a default global propagator use a composite including tracecontext, baggage then the vended span would never be a parent. To the extent that a user has configured a different global propagator then we're out of the default behavior and into user-determined behavior and the user has the flexibility to choose the order they configure propagators in a composite propagator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the appropriate thing to do is to further extend the user agency offered by this specification to require that instrumentation give users control over how the carrier is constructed so that they can choose to either overlay or underlay the value from the Lambda execution environment as is appropriate for their use case.

I agree with extending user agency to pick which parent (or link) they want, but my understanding of the TC's "lean into propagators" meant that this would be better to do via propagators (and maybe propagator configuration) as opposed to via instrumentation configuration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're in agreement here. I might add only that instrumentation configuration should include propagator selection as there may be cases where specific instrumentation should be configured to use a different propagator, but that is orthogonal to how the instrumentation constructs the carrier to pass to whatever propagator it does use, which is the focus of this PR.


**Note**: When instrumenting a Java AWS Lambda, instrumentation SHOULD first try to parse an OpenTelemetry `Context` out of the system property `com.amazonaws.xray.traceHeader` using the [AWS X-Ray Propagator](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/context/api-propagators.md) before checking and attempting to parse the environment variable above.
Instrumentation MUST provide users with the option to create a `Link` to the AWS Lambda vended spans. If this
Aneurysm9 marked this conversation as resolved.
Show resolved Hide resolved
option is enabled then the instrumentation MUST attempt to extract a `SpanContext` from a `Carrier` including
only the value of the `_X_AMZN_TRACE_ID` environment variable in a `Field` with the name "X-Amzn-Trace-Id". If
a valid `SpanContext` is extracted then it MUST be included as a `Link` in the initial span created by the
instrumentation. The `Link` MUST have an associated `String`-typed
attribute with a key of `source` and a value of `x-ray-env`, in order to indicate the source of the linked span.

[Span Link]: https://opentelemetry.io/docs/concepts/signals/traces/#span-links
**Note**: When instrumenting a Java AWS Lambda function, instrumentation SHOULD first try to obtain a value
from the system property `com.amazonaws.xray.traceHeader` before checking the `_X_AMZN_TRACE_ID` environment variable.

## API Gateway

Expand Down