Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot send traces to Datadog agent #2225

Closed
col opened this issue Dec 7, 2022 · 7 comments · Fixed by #3196 or #3421
Closed

Cannot send traces to Datadog agent #2225

col opened this issue Dec 7, 2022 · 7 comments · Fixed by #3196 or #3421
Assignees
Labels
component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. raised by user triage

Comments

@col
Copy link
Contributor

col commented Dec 7, 2022

Describe the bug
I'm deploying apollo-router using the helm chart (v1.0.0-rc.7) into a k8s cluster. The cluster has the Datadog agent running as a Daemonset and is working as expected for traces coming from the backend services. I've been working on this for days now and I cannot get the traces from apollo-router to appear in Datadog.

To Reproduce
I'm using the following configuration:

...
    telemetry:
      apollo:
        field_level_instrumentation_sampler: 1
      tracing:
        trace_config:
          service_name: "apollo-router"
          sampler: 1
        propagation:
          datadog: true
        datadog:
          endpoint: "http://${env.DD_AGENT_HOST}:8126"
...
extraEnvVars:
  - name: APOLLO_ROUTER_LOG
    value: debug
  - name: DD_AGENT_HOST
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: status.hostIP

Expected behavior
I'm expecting to see trace data in the Datadog console

Output
If applicable, add output to help explain your problem.

Additional context
I've enabled debug logging (APOLLO_ROUTER_LOG=debug) but it's not giving me anything particularly useful to debug this issue.

{"timestamp":"2022-12-07T02:39:29.794289Z","level":"INFO","fields":{"message":"Apollo Router v1.4.0 // (c) Apollo Graph, Inc. // Licensed as ELv2 (https://go.apollo.dev/elv2)"},"target":"apollo_router::executable"}
{"timestamp":"2022-12-07T02:39:29.794368Z","level":"INFO","fields":{"message":"Anonymous usage data is gathered to inform Apollo product development.  See https://go.apollo.dev/o/privacy for more info."},"target":"apollo_router::executable"}
{"timestamp":"2022-12-07T02:39:29.814014Z","level":"DEBUG","fields":{"message":"starting"},"target":"apollo_router::state_machine"}
{"timestamp":"2022-12-07T02:39:32.660595Z","level":"DEBUG","fields":{"message":"starting http"},"target":"apollo_router::state_machine"}
{"timestamp":"2022-12-07T02:39:32.660661Z","level":"DEBUG","fields":{"message":"adding plugin apollo.headers with user provided configuration"},"target":"apollo_router::configuration"}
{"timestamp":"2022-12-07T02:39:32.660675Z","level":"DEBUG","fields":{"message":"adding plugin apollo.include_subgraph_errors with user provided configuration"},"target":"apollo_router::configuration"}
{"timestamp":"2022-12-07T02:39:32.660679Z","level":"DEBUG","fields":{"message":"adding plugin apollo.telemetry with user provided configuration"},"target":"apollo_router::configuration"}
{"timestamp":"2022-12-07T02:39:32.660691Z","level":"DEBUG","fields":{"message":"creating plugin: 'apollo.headers' with configuration:\n{\n  \"all\": {\n    \"request\": [\n      {\n        \"propagate\": {\n          \"named\": \"Authorization\"\n        }\n      },\n      {\n        \"propagate\": {\n          \"named\": \"Client-ID\"\n        }\n      },\n      {\n        \"propagate\": {\n          \"named\": \"Client-Version\"\n        }\n      }\n    ]\n  }\n}"},"target":"apollo_router::router_factory"}
{"timestamp":"2022-12-07T02:39:32.660737Z","level":"DEBUG","fields":{"message":"creating plugin: 'apollo.include_subgraph_errors' with configuration:\n{\n  \"all\": true\n}"},"target":"apollo_router::router_factory"}
{"timestamp":"2022-12-07T02:39:32.660749Z","level":"DEBUG","fields":{"message":"creating plugin: 'apollo.telemetry' with configuration:\n{\n  \"apollo\": {\n    \"field_level_instrumentation_sampler\": 0.5\n  },\n  \"metrics\": {\n    \"common\": {\n      \"resources\": {\n        \"env\": \"dev\",\n        \"service.name\": \"apollo-router\"\n      },\n      \"service_name\": \"apollo-router\"\n    },\n    \"prometheus\": {\n      \"enabled\": true,\n      \"listen\": \"0.0.0.0:9090\",\n      \"path\": \"/metrics\"\n    }\n  },\n  \"tracing\": {\n    \"datadog\": {\n      \"endpoint\": \"http://10.20.0.54:8126\"\n    },\n    \"propagation\": {\n      \"datadog\": true\n    },\n    \"trace_config\": {\n      \"sampler\": 0.5,\n      \"service_name\": \"apollo-router\"\n    }\n  }\n}"},"target":"apollo_router::router_factory"}
{"timestamp":"2022-12-07T02:39:32.660822Z","level":"DEBUG","fields":{"message":"starting Spaceport"},"target":"apollo_router::plugins::telemetry"}
{"timestamp":"2022-12-07T02:39:32.660902Z","level":"DEBUG","fields":{"message":"configuring Apollo metrics"},"target":"apollo_router::plugins::telemetry::metrics::apollo"}
{"timestamp":"2022-12-07T02:39:32.660912Z","level":"INFO","fields":{"message":"Apollo Studio usage reporting is enabled. See https://go.apollo.dev/o/data for details"},"target":"apollo_router::plugins::telemetry::metrics::apollo"}
{"timestamp":"2022-12-07T02:39:32.660916Z","level":"DEBUG","fields":{"message":"creating metrics exporter"},"target":"apollo_router::plugins::telemetry::metrics::apollo"}
{"timestamp":"2022-12-07T02:39:32.661073Z","level":"INFO","fields":{"message":"creating apollo exporter","spaceport_endpoint":"https://127.0.0.1:35911/"},"target":"apollo_router::plugins::telemetry::apollo_exporter"}
{"timestamp":"2022-12-07T02:39:32.661195Z","level":"DEBUG","fields":{"message":"configuring Datadog tracing"},"target":"apollo_router::plugins::telemetry::tracing::datadog"}
{"timestamp":"2022-12-07T02:39:32.661907Z","level":"DEBUG","fields":{"message":"configuring Apollo tracing"},"target":"apollo_router::plugins::telemetry::tracing::apollo"}
{"timestamp":"2022-12-07T02:39:32.661919Z","level":"DEBUG","fields":{"message":"configuring exporter to Studio"},"target":"apollo_router::plugins::telemetry::tracing::apollo"}
{"timestamp":"2022-12-07T02:39:32.661923Z","level":"DEBUG","fields":{"message":"creating studio exporter"},"target":"apollo_router::plugins::telemetry::tracing::apollo_telemetry"}
{"timestamp":"2022-12-07T02:39:32.662019Z","level":"INFO","fields":{"message":"creating apollo exporter","spaceport_endpoint":"https://127.0.0.1:35911/"},"target":"apollo_router::plugins::telemetry::apollo_exporter"}
{"timestamp":"2022-12-07T02:39:32.662238Z","level":"DEBUG","fields":{"message":"plugins list: [\"apollo.include_subgraph_errors\", \"apollo.csrf\", \"apollo.telemetry\", \"apollo.headers\"]"},"target":"apollo_router::router_factory"}
{"timestamp":"2022-12-07T02:39:32.664781Z","level":"DEBUG","fields":{"message":"adding plugin apollo.headers with user provided configuration"},"target":"apollo_router::configuration"}
{"timestamp":"2022-12-07T02:39:32.664797Z","level":"DEBUG","fields":{"message":"adding plugin apollo.include_subgraph_errors with user provided configuration"},"target":"apollo_router::configuration"}
{"timestamp":"2022-12-07T02:39:32.664800Z","level":"DEBUG","fields":{"message":"adding plugin apollo.telemetry with user provided configuration"},"target":"apollo_router::configuration"}
{"timestamp":"2022-12-07T02:39:34.664906Z","level":"INFO","fields":{"message":"healthcheck endpoint exposed at http://0.0.0.0:8088/health"},"target":"apollo_router::axum_factory::axum_http_server_factory"}
{"timestamp":"2022-12-07T02:39:34.665048Z","level":"INFO","fields":{"message":"GraphQL endpoint exposed at http://0.0.0.0:80/ 🚀"},"target":"apollo_router::axum_factory::axum_http_server_factory"}
{"timestamp":"2022-12-07T02:39:34.665109Z","level":"DEBUG","fields":{"message":"extra endpoints the router listens to","tracing_endpoints":"http://0.0.0.0:9090, http://0.0.0.0:8088"},"target":"apollo_router::axum_factory::axum_http_server_factory"}
{"timestamp":"2022-12-07T02:40:00.758932Z","level":"DEBUG","message":"resolving host=\"uplink.api.apollographql.com\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
...this line repeated multiple times...
{"timestamp":"2022-12-07T02:40:00.758932Z","level":"DEBUG","message":"resolving host=\"uplink.api.apollographql.com\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
{"timestamp":"2022-12-07T02:47:31.082196Z","level":"DEBUG","message":"resolving host=\"service1.dev.svc.cluster.local\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
{"timestamp":"2022-12-07T02:47:32.668480Z","level":"DEBUG","message":"resolving host=\"usage-reporting.api.apollographql.com\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
{"timestamp":"2022-12-07T02:47:34.841392Z","level":"DEBUG","message":"resolving host=\"uplink.api.apollographql.com\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
{"timestamp":"2022-12-07T02:47:43.165177Z","level":"DEBUG","message":"resolving host=\"service2.dev.svc.cluster.local\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
{"timestamp":"2022-12-07T02:47:52.973668Z","level":"DEBUG","message":"resolving host=\"service3.dev.svc.cluster.local\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
{"timestamp":"2022-12-07T02:47:53.068142Z","level":"DEBUG","message":"resolving host=\"service4.dev.svc.cluster.local\"","target":"hyper::client::connect::dns","filename":"/home/circleci/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.23/src/client/connect/dns.rs","line_number":122}
@abernix abernix added the component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. label Jan 13, 2023
@Geal
Copy link
Contributor

Geal commented Jan 20, 2023

@col is this still happening?

@col
Copy link
Contributor Author

col commented Jan 22, 2023

Yeah, I never managed to get this working

@Geal
Copy link
Contributor

Geal commented Jan 23, 2023

alright, let's look more closely at that now that a lot of fixes went into our telemetry stack.

A few questions first:

  • Have you tried with router 1.8?
  • are you able to do requests properly with the router? (the DNS errors at the end of the trace look concerning)
  • do you see anything in the Datadog agent logs?
  • did you try using the OTLP exporter, sending to the Datadog agent? The agent should work with these env variables IIRC:
    • DD_APM_ENABLED=true
    • DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT=0.0.0.0:4317

(we have a bit more confidence in the OTLP exporter than in the Datadog one, they come from different libraries)

@Geal Geal self-assigned this Feb 6, 2023
@Geal
Copy link
Contributor

Geal commented Feb 6, 2023

@col any updates on this?

@col
Copy link
Contributor Author

col commented Feb 6, 2023

Thanks for the prompt. I have not given this another try yet but I'll get back to it soon.

  • Only tried with v1.4. I'll upgrade to v1.8 when I re-try.
  • Yes, the router has been working as expected when trying to get tracing up and running
  • Not seeing anything of value in the agent logs. I'm successfully getting the prometheus metrics in DD though.
  • Didn't try the OTLP exporter. I'll give it a shot. Good to know it's the more reliable option.

@garypen garypen assigned garypen and unassigned Geal Jun 5, 2023
@garypen
Copy link
Contributor

garypen commented Jun 5, 2023

This is fixed in the upgrade to opentelemetry 0.19.0. I verified by creating a k8s cluster, installing the datadog agent and exporting traces to the datadog agent as follows:

datadog configuration fragment (as per the official DD helm chart)

datadog:
  apiKey: "${DATADOG_KEY}"
  collectEvents: true
  apm:
    portEnabled: true

router configuration fragment

      tracing:
        datadog:
          endpoint: http://${env.NODE_NAME}:8126
          enable_span_mapping: true
        propagation:
          datadog: true
        trace_config:
          attributes:
            env: gary-cluster
          service_name: ${env.DD_SERVICE:-gary-cluster-otel}
          service_namespace: router

@o0Ignition0o
Copy link
Contributor

Reopening the issue since we have to revert the upgrade until they release a patch. See #3242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. raised by user triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants