-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] truncate_all function corrupts UTF-8 encoding #36017
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
open-telemetry/opentelemetry-go#3156 seems like a quick fix, but iterating over the entire string is inefficient. Especially for Tailscale's UTF-8 aware truncation seems much better, although licensed under |
We're definitely interested in a fix. If there isn't an efficient solution we could add an optional parameter that enables when to check for this kind of truncation. |
If anyone else is facing the same problem, switching to HTTP(s) with JSON payload on the exporter temporarily solves it during exporting (the ingestion side is happy and does not drop the entire batch). The JSON marshaller in Go coerces invalid UTF-8 byte sequences to valid UTF-8 by replacing them with This is not a fix, but just a temporary workaround. |
I want to join this contribfest! Could you assign it to me? |
Component(s)
pkg/ottl
What happened?
Description
The
truncate_all
OTTL function truncates the string (UTF-8 encoded byte slice in Go) without checking if it breaks multi-byte UTF-8 codepoints in the middle. So using thetransform
processor withtruncate_all
often results in corrupted UTF-8 values.Here's the code:
opentelemetry-collector-contrib/pkg/ottl/ottlfuncs/func_truncate_all.go
Line 50 in ed09990
(It looks like opentelemetry-go SDK had a similar problem which is fixed now: open-telemetry/opentelemetry-go#3160)
Impact:
While the collector or a chain of collectors usually works fine with the corrupted UTF-8 down the line (open-telemetry/opentelemetry-collector#11449), some components fail:
(We are seeing similar behavior with one of the vendors too. They drop the entire batch if anything contains an invalid UTF-8 string)
Steps to Reproduce
The
truncate_all
slicing code should be self-explanatory that it doesn't handlerune
.Expected Result
truncate_all
will slice the string up to the given length. If truncating at exactly the length results in a broken UTF-8 encoding, it'll truncate before where the lastrune
started.Actual Result
truncate_all
slices UTF-8 encoded string without checking if it breaks a multi-byte character in the middle.This results in the entire batch getting dropped on the vendor side.
Collector version
v0.111.0
Environment information
Environment
This applies to all environment
OpenTelemetry Collector configuration
The text was updated successfully, but these errors were encountered: