Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proto mismatch errors not obvious when posting traces #6320

Closed
Pitta opened this issue Oct 15, 2022 · 8 comments
Closed

Proto mismatch errors not obvious when posting traces #6320

Pitta opened this issue Oct 15, 2022 · 8 comments
Labels
bug Something isn't working Stale

Comments

@Pitta
Copy link

Pitta commented Oct 15, 2022

Describe the bug
When sending test traces via otlp-http, if the format of the trace is mismatched the server still returns a 200 with no indication of a problem.

$ curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @trace.json

HTTP/1.1 200 OK
Content-Type: application/json
Date: Sat, 15 Oct 2022 00:30:09 GMT
Content-Length: 2

{}

Steps to reproduce
config: BELOW

docker-compose.yaml:

version: "3"
services:
  otel-collector:
    image: otel/opentelemetry-collector:0.60.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP http receiver
      - "13133:13133" # Health Check

trace.json (from this example)

{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "curl-test-otel-pipeline"
            }
          }
        ]
      },
      "instrumentationLibrarySpans": [
        {
          "spans": [
            {
              "traceId": "71699b6fe85982c7c8995ea3d9c95df2",
              "spanId": "3c191d03fa8be065",
              "name": "test-span",
              "kind": 1,
              "droppedAttributesCount": 0,
              "events": [],
              "droppedEventsCount": 0,
              "status": {
                "code": 1
              }
            }
          ],
          "instrumentationLibrary": {
            "name": "local-curl-example"
          }
        }
      ]
    }
  ]
 }
  1. docker compose up
  2. from another terminal curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @trace.json

What did you expect to see?
A 200 response if the trace data was 👍
Anything else with a message mentioning the proto mistmatch

What did you see instead?

$ curl -i http://localhost:4318/v1/traces -X POST -H "Content-Type: application/json" -d @trace.json

HTTP/1.1 200 OK
Content-Type: application/json
Date: Sat, 15 Oct 2022 00:30:09 GMT
Content-Length: 2

{}

What version did you use?
Version: 0.60.0

What config did you use?

exporters:
  logging:

extensions:
  health_check: {}
  memory_ballast: {}

processors:
  batch: {}
  memory_limiter:
    check_interval: 5s
    limit_mib: 409
    spike_limit_mib: 128

receivers:
  otlp:
    protocols:
      grpc:
      http:

service:
  extensions:
  - health_check
  - memory_ballast
  pipelines:
    traces:
      exporters:
      - logging
      - otlp
      processors:
      - memory_limiter
      - batch
      receivers:
      - otlp
  telemetry:
    logs:
      level: "debug"
@Pitta Pitta added the bug Something isn't working label Oct 15, 2022
@austinlparker
Copy link
Member

FYI I replicated this on latest (0.62.1) with the same config + json payload.

@TylerHelmuth
Copy link
Member

I've seen a customer experience this issue recently as well when sending traces via the Swift SDK, which didn't have the latest proto yet.

@bogdandrutu
Copy link
Member

This is a bit unfortunate, I can explain why this happens, not sure what is the right behavior:

The combination of the 2 makes this case to return 200. Not sure how based on the current spec we can return a different code.

@austinlparker
Copy link
Member

I think if we returned a useful response body, it'd help. Right now you get 200 OK with an empty response; something like "dropped unknown fields [xxx, yyy, zzz]" would at least give users a clue what was happening.

@TylerHelmuth
Copy link
Member

In the PR that implemented the spec change in the collector we discussed a scenario like this: #5931. Maybe it's time to allow "strict-enforcement" via #5935.

I also like the idea of having some important fields, like ResourceSpans and InstrumentationScopeSpans, that are vocal when skipped. This would help users know what's happening.

@TylerHelmuth
Copy link
Member

As discussed in the SIG today, maybe OTLP's partial success response can help here.

@tigrannajaryan
Copy link
Member

As discussed in the SIG today, maybe OTLP's partial success response can help here.

To clarify: for this to work we need to add "accepted" counter to the partial success response.

@Pitta
Copy link
Author

Pitta commented Jan 23, 2025

I've sense left the org that led to this issue.
I am not currently chasing this any longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
5 participants