Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace ID is not propagated from SQS message with AWSXRayPropagator #1149

Closed
sssmolarkiewicz opened this issue Sep 2, 2022 · 13 comments
Closed
Assignees
Labels
bug Something isn't working pkg:id-generator-aws-xray priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect stale

Comments

@sssmolarkiewicz
Copy link

What version of OpenTelemetry are you using?

    "@opentelemetry/api": "^1.1.0",
    "@opentelemetry/auto-instrumentations-node": "^0.31.2",
    "@opentelemetry/exporter-trace-otlp-grpc": "^0.31.0",
    "@opentelemetry/id-generator-aws-xray": "^1.1.0",
    "@opentelemetry/propagator-aws-xray": "^1.1.0",
    "@opentelemetry/resource-detector-aws": "^1.1.1",
    "@opentelemetry/sdk-trace-node": "^1.6.0",
    "@opentelemetry/sdk-trace-base": "^1.6.0",

What version of Node are you using?

16.17.0

What did you do?

I've configured tracer with AWSXRayIdGenerator and AWSXRayPropagator in my two applications AppA and AppB. The applications communicate over SQS. I've triggered producing a message in AppA through a HTTP request, and received the message in AppB. In AppB I am able to see x-amzn-trace-id message attribute set in the received message.
Collector runs as a sidecar container in ECS task within each service. Here is my tracer configuration (it's the same in both apps):

        const tracerConfig: NodeTracerConfig = {
          idGenerator: new AWSXRayIdGenerator(),
          resource: Resource.default().merge(
            new Resource({
              [SemanticResourceAttributes.SERVICE_NAME]: SERVICE_NAME,
            }).merge(
              await detectResources({
                detectors: [awsEc2Detector, awsEcsDetector],
              }),
            ),
          ),
        };
        const tracerProvider = new NodeTracerProvider(tracerConfig);

        const otlpExporter = new OTLPTraceExporter({
          url: 'localhost:4317',
          credentials: grpc.ChannelCredentials.createInsecure(),
        });
        tracerProvider.addSpanProcessor(new BatchSpanProcessor(otlpExporter));

        const propagator = new AWSXRayPropagator();
        propagation.setGlobalPropagator(propagator);
        tracerProvider.register({
          propagator,
        });
        registerInstrumentations({
          tracerProvider,
          instrumentations: [
            new PrismaInstrumentation({ middleware: true }),
            getNodeAutoInstrumentations({
              '@opentelemetry/instrumentation-express': {
                //ignore traces for root handlers
                //by default the following spans are created for GET /foo request:
                //  request handler - /
                //  request handler - /
                //  request handler - /foo
                //we only want the last one to be traced
                ignoreLayers: [(name): boolean => name.includes('*')],
              },
              '@opentelemetry/instrumentation-aws-sdk': {
                // hide HTTP requests done within SDK to AWS services
                suppressInternalInstrumentation: true,
                preRequestHook: (span, info) => {
                  if (info.request.commandName === 'ReceiveMessage' && info.request.serviceName === 'SQS') {
                    //set name of ECS Container if the operation is receiving message from SQS
                    //container shows up as `SQS` by default
                    span.setAttributes({
                      'peer.service': SERVICE_NAME,
                    });
                  }
                },
              },
            }),
          ],
        });
        trace.setGlobalTracerProvider(tracerProvider);
        return tracerProvider.getTracer(SERVICE_NAME);

What did you expect to see?

I expect to see one trace consisting of the spans for HTTP request and producing message in AppA and receiving the message in AppB.

What did you see instead?

I have two traces, one with HTTP request and producing message in AppA and the second with receiving the message in AppB.

Additional context

I am able to extract the trace ID from the message attribute and create a span with this trace ID with the code below. However, I have following problems with this approach:

  1. Service AppB does not appear in the service map - all I see is service AppA making a request to SQS
  2. As I use an auto instrumentation of AWS SDK, a separate trace for receiving the message in AppB is created anyway, it consists of spans automatically created by an instrumentation (ECS Container, queue process span, SQS; span created by me is lacking here, as it's created in propagated context)

If it matters I use NestJS and @nestjs-packages/sqs.

  @SqsMessageHandler(false)
  public async handle(message: SQS.Message): Promise<void> {
    console.log(message);
    await this.tracer.startActiveSpan(
      'InternalTransportConsumerService.handle',
      { kind: SpanKind.CONSUMER },
      propagation.extract(context.active(), message.MessageAttributes ?? {}, sqsContextGetter),
      async (span) => {
        //...
      },
    );

Here is the trace generated with this approach - no AppB in service map, a separate trace for AppB is also created (AppB produces another message to AppA, hence a second handle in the trace):
Zrzut ekranu z 2022-09-02 13-09-31
Zrzut ekranu z 2022-09-02 13-14-28

@sssmolarkiewicz sssmolarkiewicz added the bug Something isn't working label Sep 2, 2022
@dyladan dyladan added the priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect label Sep 14, 2022
@willarmiros
Copy link
Contributor

Hi @dyladan - is there any chance we could add an AWS or X-Ray label to the repo for issues impacting AWS-related items? I'm not in a role where I can dive too deep into these issues anymore, but if we had a label system I can raise it to others

@rauno56
Copy link
Member

rauno56 commented Sep 22, 2022

I added labels for each of the components. Attaching them to this issue as well - perhaps that will help you filter out what you are looking for.

@willarmiros
Copy link
Contributor

Thanks! This is perfect, we will have someone follow up here.

@sssmolarkiewicz
Copy link
Author

Hi @willarmiros can I expect any update on this?

@carolabadeer
Copy link
Contributor

@mentos1386 I see you wrote the case that parses ReceiveMessage in SQS in PR #968, can you please clarify the reason behind using pubsubPropagation? According to this ReadMe, pubSub creates a new process span, which I believe is what is causing this issue above. Is there an alternative method to parsing the received SQS message that would not involve creating extra spans?

@mentos1386
Copy link
Contributor

@carolabadeer I'm not entirely sure why it's used. I was only touching the code enough to add the missing message id attribute.

This might be a good read #707, i think i remember also seeing a lengthier discussion on this topic somewhere but can't find it atm.

We solved this issue by implementing our own propagation by injecting/extracting traces from message payload that we control. This way SQS traces are only "transport layer" traces, and the actual cross service communication propagation is handled by us. It works good enough for us, but it would be nice if we wouldn't have to do it.

@sssmolarkiewicz
Copy link
Author

@mentos1386

We solved this issue by implementing our own propagation by injecting/extracting traces from message payload that we control. This way SQS traces are only "transport layer" traces, and the actual cross service communication propagation is handled by us. It works good enough for us, but it would be nice if we wouldn't have to do it.

This is the workaround that I tried and described in the post. I could keep the additional "transport layer" trace of SQS, but the problem is that the App that processes the received message does not appear on the service graph. Did you manage to get it displayed there, or does your service graph look like the one I posted, with one App and SQS nodes on it?

The perfect solution would be obviously to have those automatically created spans to be included in the same trace, but they seem to ignore the trace ID header set in the message attribute.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2023

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Jan 2, 2023
@sssmolarkiewicz
Copy link
Author

Hey @willarmiros, is there any chance this will be resolved, or is there any workaround to generate correct service maps and consistent traces with SQS using OTEL and XRay?

@carolabadeer
Copy link
Contributor

Hi @sssmolarkiewicz, this gap is on our roadmap and we will use this issue to post and track updates once we are able to prioritize it

@github-actions github-actions bot removed the stale label Feb 20, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Apr 24, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has been stale for 14 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2023
@estyrke
Copy link

estyrke commented Nov 3, 2023

Was there any progress made on this? I have the same issue, but with the Python SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:id-generator-aws-xray priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect stale
Projects
None yet
Development

No branches or pull requests

8 participants