-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032
base: main
Are you sure you want to change the base?
[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032
Conversation
Benchmark results: TL;DR: the throughput is over 2x for metrics and over 3x for logs and traces. The allocated bytes/op are reduced 82% for metrics and 95% for logs and traces.
|
scopeMetrics := scopeMetrics.At(j) | ||
scope := scopeMetrics.Scope() | ||
groupedDataPointsByIndex := make(map[string]map[uint32][]dataPoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewer: I made it so that documents from different scopes are never merged. This simplified the serialization logic and also fixes a subtle bug in the current implementation where we're only hashing the scope attributes but not the scope name. This leads to grouping of potentially different scopes to the same document. I guess as a consequence, we should also add the scope name as a dimension in the mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by moving this here, rather than outside of the scopeMetrics
loop, we're assuming that there will never be two identical scopes within a resource. Is that a safe assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's no worse than the existing assumption that resourceMetrics is free of duplicate resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will make it safe is that the only consequence of being wrong in the assumption is leaving some storage savings on the table. In other words, we should prioritize elastic/elasticsearch#99123, which turns out to be more of an issue than anticipated in various contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't duplicates resources/scopes lead to duplicate _tsid & doc rejections? Definitely agree on prioritising that issue though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would, until we fix the referenced issue.
bytes.Buffer.Write is guaranteed to not return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The amount of handwritten serialisation makes me a little uncomfortable, but we can perhaps improve that with code generation later.
I've introduced pooling for the buffer holding the serialized events in 5e523c5. This reduced allocation by another 60% and increased throughput as well. I suppose most of the remaining allocations are from creating the pdata model itself, something that we can't easily optimize and not allocations that are directly caused by the ES exporter itself. I've updated the benchmark results in #37032 (comment). |
exporter/elasticsearchexporter has more than one function: "NewBufferPool,NewFactory"
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #37032 +/- ##
==========================================
- Coverage 79.60% 79.60% -0.01%
==========================================
Files 2252 2254 +2
Lines 211920 212032 +112
==========================================
+ Hits 168704 168781 +77
- Misses 37549 37576 +27
- Partials 5667 5675 +8 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good, just a few nits! This will allow us to remove a lot of workarounds due to OTel mode.
// Copyright The OpenTelemetry Authors | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
package elasticsearchexporter // import "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Seems that this serializer is a good candidate to be moved to a separate package, and only expose the serialize*
funcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried that in the beginning but there are a few package-private things it needs to access. For example, dataPoint
and dataStream*
constants in attribute.go
, which we'd need to make public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved bufferpool into a separate package now
Co-authored-by: Carson Ip <carsonip@users.noreply.github.com>
4277f76 and 2328a7a add some more minor optimizations that have an average change to the previous commit of +9% events/s and -6.75% B/op. I have another change lined up to mark the component with |
Directly serializes pdata to JSON in OTel mode
objmodel.Document
needs to be created first