Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to unmarshal message when using kafka receiver #36776

Open
xhyzzZ opened this issue Dec 11, 2024 · 2 comments
Open

failed to unmarshal message when using kafka receiver #36776

xhyzzZ opened this issue Dec 11, 2024 · 2 comments
Labels

Comments

@xhyzzZ
Copy link

xhyzzZ commented Dec 11, 2024

Component(s)

receiver/kafka

What happened?

Description

We have a migration plan to migrate telemetry from Jaeger collector to Opentelemetry collector and Tempo as storage backend.

We are building new telemetry pipeline: OTLP SDK -> Kafka -> (kafka receiver) OTEL collector (OTEL exporter) -> Tempo to replace the old
OTLP SDK -> Jaeger Collector -> Cassandra

We have seen some of the traces failed with two different exceptions in the new pipeline, to be noticed, there are some traces passing through:

  1. unexpected EOF
2024-12-10T21:27:42.697Z	debug	kafkareceiver@v0.114.0/kafka_receiver.go:550	Kafka message claimed	{"kind": "receiver", "name": "kafka", "data_type": "traces", "value": "\n\ufffd\u0002\n.\n,\n\u000cservice.name\u0012\u001c\n\u001apegasus-non-regulated-prod\u0012\ufffd\u0002\n\r\n\u000bDhariTracer\u0012\ufffd\u0001\n\u0010 \ufffd}@\ufffd\ufffd\u0011\ufffdp\ufffdg\ufffda\u0012\u0008\ufffd\ufffd\ufffd\u001d\u0016.o\ufffd\"\u0008\nث\nG\ufffd?\ufffd*\u0010Dhari-kafka_send0\u00049\u0000\ufffdp^\ufffd\ufffd\u000f\u0018Au\u000f\ufffd^\ufffd\ufffd\u000f\u0018J\u001a\n\u0011trinity.namespace\u0012\u0005\n\u0003acsJ<\n\u0012trinity.event.uuid\u0012&\n$20877d40-b6de-11ef-90bb-cd70f367c561J#\n\rathena.source\u0012\u0012\n\u0010if_agent_toolboxJ!\n\u0016trinity.component.name\u0012\u0007\n\u0005dhariP\u0001z\u0000", "timestamp": "2024-12-10T10:04:22.931Z", "topic": "aml-inbound-trinity-proxy-arcadianr-prod"}
2024-12-10T21:27:42.697Z	error	kafkareceiver@v0.114.0/kafka_receiver.go:569	failed to unmarshal message	{"kind": "receiver", "name": "kafka", "data_type": "traces", "error": "unexpected EOF"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver.(*tracesConsumerGroupHandler).ConsumeClaim
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver@v0.114.0/kafka_receiver.go:569
github.com/IBM/sarama.(*consumerGroupSession).consume
	github.com/IBM/sarama@v1.43.3/consumer_group.go:952
github.com/IBM/sarama.newConsumerGroupSession.func2
	github.com/IBM/sarama@v1.43.3/consumer_group.go:877
  1. proto: TracesData: wiretype end group for non-group
2024-12-10T21:27:42.031Z	debug	kafkareceiver@v0.114.0/kafka_receiver.go:550	Kafka message claimed	{"kind": "receiver", "name": "kafka", "data_type": "traces", "value": "\n\ufffd\u0004\n.\n,\n\u000cservice.name\u0012\u001c\n\u001apegasus-non-regulated-prod\u0012\ufffd\u0004\n\r\n\u000bDhariTracer\u0012\ufffd\u0001\n\u0010\ufffdq\ufffd`\ufffd=\u0011\ufffd\ufffd\u000f\ufffdo}j\ufffdD\u0012\u0008FE\u0016_\ufffdǑt\"\u0008\ufffd(p\ufffdyLp\ufffd*\u0010Dhari-kafka_send0\u00049\ufffd\ufffd\ufffd5\ufffd\ufffd\u000f\u0018A\ufffdu26\ufffd\ufffd\u000f\u0018J<\n\u0012trinity.event.uuid\u0012&\n$9671c060-b73d-11ef-880f-f96f7d6a9e44J*\n\rathena.source\u0012\u0019\n\u0017itunes_content_purchaseJ\u001a\n\u0011trinity.namespace\u0012\u0005\n\u0003ampJ!\n\u0016trinity.component.name\u0012\u0007\n\u0005dhariP\u0001z\u0000\u0012\ufffd\u0001\n\u0010\ufffdt\ufffd\ufffd\ufffd=\u0011\ufffd\ufffd\ufffd\ufffdo}j\ufffdD\u0012\u0008?%\ufffdN\ufffdw\ufffdS\"\u0008p\u0014O\ufffd`\ufffd\ufffd**\u0010Dhari-kafka_send0\u00049\ufffd\ufffd\u00108\ufffd\ufffd\u000f\u0018A\ufffd.z8\ufffd\ufffd\u000f\u0018J<\n\u0012trinity.event.uuid\u0012&\n$9674a690-b73d-11ef-a9dd-f96f7d6a9e44J*\n\rathena.source\u0012\u0019\n\u0017itunes_content_purchaseJ\u001a\n\u0011trinity.namespace\u0012\u0005\n\u0003ampJ!\n\u0016trinity.component.name\u0012\u0007\n\u0005dhariP\u0001z\u0000", "timestamp": "2024-12-10T21:27:41.988Z", "topic": "aml-inbound-trinity-proxy-arcadianr-prod"}
2024-12-10T21:27:42.032Z	error	kafkareceiver@v0.114.0/kafka_receiver.go:569	failed to unmarshal message	{"kind": "receiver", "name": "kafka", "data_type": "traces", "error": "proto: TracesData: wiretype end group for non-group"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver.(*tracesConsumerGroupHandler).ConsumeClaim
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver@v0.114.0/kafka_receiver.go:569
github.com/IBM/sarama.(*consumerGroupSession).consume
	github.com/IBM/sarama@v1.43.3/consumer_group.go:952
github.com/IBM/sarama.newConsumerGroupSession.func2
	github.com/IBM/sarama@v1.43.3/consumer_group.go:877

I have seen some cases sending all three types of telemetry data to a single kafka topic which causes proto: TracesData: wiretype end group for non-group which in my case I only enable traces for kafka, am I use the right OTEL config?

I highly suspect that the root cause is the very old version of OTEL Java SDK, maybe because of the OTEL proto change and that is not backward compatiable? Are kafka receiver also using the latest version of OTEL proto? But it's weird that some of the traces are coming through with no error.

Steps to Reproduce

Deployed OTEL collector and generate spans from OTEL Java SDK.

Expected Result

Traces show up in Tempo UI

Actual Result

Some of the traces are not showing up in Tempo UI with exception in OTEL collector logs

Collector version

0.114.0

Environment information

I have use a very old version of OTEL Java SDK: https://github.com/open-telemetry/opentelemetry-java/releases/tag/v1.30.0

Environment

OpenTelemetry Collector configuration

pipelines:
  traces/in:
    receivers: [otlp, kafka]
  metrics:
    receivers  : [otlp,prometheus]
  logs: null

Log output

From debug log:
1. unexpected EOF

2024-12-10T21:27:42.697Z	debug	kafkareceiver@v0.114.0/kafka_receiver.go:550	Kafka message claimed	{"kind": "receiver", "name": "kafka", "data_type": "traces", "value": "\n\ufffd\u0002\n.\n,\n\u000cservice.name\u0012\u001c\n\u001apegasus-non-regulated-prod\u0012\ufffd\u0002\n\r\n\u000bDhariTracer\u0012\ufffd\u0001\n\u0010 \ufffd}@\ufffd\ufffd\u0011\ufffdp\ufffdg\ufffda\u0012\u0008\ufffd\ufffd\ufffd\u001d\u0016.o\ufffd\"\u0008\nث\nG\ufffd?\ufffd*\u0010Dhari-kafka_send0\u00049\u0000\ufffdp^\ufffd\ufffd\u000f\u0018Au\u000f\ufffd^\ufffd\ufffd\u000f\u0018J\u001a\n\u0011trinity.namespace\u0012\u0005\n\u0003acsJ<\n\u0012trinity.event.uuid\u0012&\n$20877d40-b6de-11ef-90bb-cd70f367c561J#\n\rathena.source\u0012\u0012\n\u0010if_agent_toolboxJ!\n\u0016trinity.component.name\u0012\u0007\n\u0005dhariP\u0001z\u0000", "timestamp": "2024-12-10T10:04:22.931Z", "topic": "aml-inbound-trinity-proxy-arcadianr-prod"}
2024-12-10T21:27:42.697Z	error	kafkareceiver@v0.114.0/kafka_receiver.go:569	failed to unmarshal message	{"kind": "receiver", "name": "kafka", "data_type": "traces", "error": "unexpected EOF"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver.(*tracesConsumerGroupHandler).ConsumeClaim
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver@v0.114.0/kafka_receiver.go:569
github.com/IBM/sarama.(*consumerGroupSession).consume
	github.com/IBM/sarama@v1.43.3/consumer_group.go:952
github.com/IBM/sarama.newConsumerGroupSession.func2
	github.com/IBM/sarama@v1.43.3/consumer_group.go:877
  1. proto: TracesData: wiretype end group for non-group
2024-12-10T21:27:42.031Z	debug	kafkareceiver@v0.114.0/kafka_receiver.go:550	Kafka message claimed	{"kind": "receiver", "name": "kafka", "data_type": "traces", "value": "\n\ufffd\u0004\n.\n,\n\u000cservice.name\u0012\u001c\n\u001apegasus-non-regulated-prod\u0012\ufffd\u0004\n\r\n\u000bDhariTracer\u0012\ufffd\u0001\n\u0010\ufffdq\ufffd`\ufffd=\u0011\ufffd\ufffd\u000f\ufffdo}j\ufffdD\u0012\u0008FE\u0016_\ufffdǑt\"\u0008\ufffd(p\ufffdyLp\ufffd*\u0010Dhari-kafka_send0\u00049\ufffd\ufffd\ufffd5\ufffd\ufffd\u000f\u0018A\ufffdu26\ufffd\ufffd\u000f\u0018J<\n\u0012trinity.event.uuid\u0012&\n$9671c060-b73d-11ef-880f-f96f7d6a9e44J*\n\rathena.source\u0012\u0019\n\u0017itunes_content_purchaseJ\u001a\n\u0011trinity.namespace\u0012\u0005\n\u0003ampJ!\n\u0016trinity.component.name\u0012\u0007\n\u0005dhariP\u0001z\u0000\u0012\ufffd\u0001\n\u0010\ufffdt\ufffd\ufffd\ufffd=\u0011\ufffd\ufffd\ufffd\ufffdo}j\ufffdD\u0012\u0008?%\ufffdN\ufffdw\ufffdS\"\u0008p\u0014O\ufffd`\ufffd\ufffd**\u0010Dhari-kafka_send0\u00049\ufffd\ufffd\u00108\ufffd\ufffd\u000f\u0018A\ufffd.z8\ufffd\ufffd\u000f\u0018J<\n\u0012trinity.event.uuid\u0012&\n$9674a690-b73d-11ef-a9dd-f96f7d6a9e44J*\n\rathena.source\u0012\u0019\n\u0017itunes_content_purchaseJ\u001a\n\u0011trinity.namespace\u0012\u0005\n\u0003ampJ!\n\u0016trinity.component.name\u0012\u0007\n\u0005dhariP\u0001z\u0000", "timestamp": "2024-12-10T21:27:41.988Z", "topic": "aml-inbound-trinity-proxy-arcadianr-prod"}
2024-12-10T21:27:42.032Z	error	kafkareceiver@v0.114.0/kafka_receiver.go:569	failed to unmarshal message	{"kind": "receiver", "name": "kafka", "data_type": "traces", "error": "proto: TracesData: wiretype end group for non-group"}
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver.(*tracesConsumerGroupHandler).ConsumeClaim
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver@v0.114.0/kafka_receiver.go:569
github.com/IBM/sarama.(*consumerGroupSession).consume
	github.com/IBM/sarama@v1.43.3/consumer_group.go:952
github.com/IBM/sarama.newConsumerGroupSession.func2
	github.com/IBM/sarama@v1.43.3/consumer_group.go:877


### Additional context
@xhyzzZ xhyzzZ added bug Something isn't working needs triage New item requiring triage labels Dec 11, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Dec 11, 2024

What is your expectation? We cannot reproduce this without heavy work and as you mentioned, you use a very old Java SDK.

Please update to the latest SDK and see if you can reproduce it, and provide steps to reproduce this as we lack enough information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants