Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Contributor

@felixbarny felixbarny commented Jan 6, 2025

Directly serializes pdata to JSON in OTel mode

@felixbarny
Copy link
Contributor Author

felixbarny commented Jan 8, 2025

Benchmark results:

TL;DR: the throughput is over 2x for metrics and over 3x for logs and traces. The allocated bytes/op are reduced 82% for metrics and 95% for logs and traces.

 goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter/integrationtest
                                      │   old.txt   │            new_final.txt            │
                                      │   sec/op    │   sec/op     vs base                │
Exporter/logs/otel/small_batch-10       79.16µ ± 1%   26.69µ ± 1%  -66.29% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      757.0µ ± 2%   225.9µ ± 3%  -70.15% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       7.392m ± 1%   2.075m ± 0%  -71.93% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      70.50m ± 1%   20.33m ± 1%  -71.17% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    414.8µ ± 1%   181.0µ ± 0%  -56.37% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   3.960m ± 0%   1.717m ± 1%  -56.63% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    39.97m ± 1%   18.12m ± 0%  -54.67% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   421.3m ± 1%   187.0m ± 1%  -55.61% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     79.64µ ± 0%   26.66µ ± 1%  -66.52% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    765.5µ ± 1%   227.4µ ± 0%  -70.29% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     7.341m ± 1%   2.102m ± 0%  -71.37% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    71.74m ± 1%   20.76m ± 0%  -71.06% (p=0.000 n=10)
geomean                                 4.171m        1.426m       -65.80%

                                      │   old.txt   │            new_final.txt             │
                                      │  events/s   │  events/s    vs base                 │
Exporter/logs/otel/small_batch-10       126.3k ± 1%   374.7k ± 1%  +196.62% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      132.1k ± 2%   442.6k ± 3%  +235.04% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       135.3k ± 1%   481.9k ± 0%  +256.21% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      141.8k ± 1%   492.0k ± 1%  +246.84% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    168.8k ± 1%   386.8k ± 0%  +129.18% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   176.7k ± 0%   407.6k ± 1%  +130.60% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    175.1k ± 1%   386.4k ± 0%  +120.62% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   166.1k ± 1%   374.3k ± 1%  +125.29% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     125.6k ± 0%   375.0k ± 1%  +198.67% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    130.6k ± 1%   439.7k ± 0%  +236.56% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     136.2k ± 0%   475.8k ± 0%  +249.27% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    139.4k ± 1%   481.6k ± 0%  +245.48% (p=0.000 n=10)
geomean                                 145.0k        424.1k       +192.44%

                                      │    old.txt    │            new_final.txt             │
                                      │     B/op      │     B/op      vs base                │
Exporter/logs/otel/small_batch-10       80.579Ki ± 0%   4.227Ki ± 0%  -94.75% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      793.10Ki ± 0%   32.21Ki ± 0%  -95.94% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       7912.7Ki ± 0%   306.5Ki ± 0%  -96.13% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      77.155Mi ± 0%   2.908Mi ± 0%  -96.23% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    403.89Ki ± 0%   68.43Ki ± 0%  -83.06% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   4020.0Ki ± 0%   660.2Ki ± 0%  -83.58% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    39.512Mi ± 0%   6.842Mi ± 0%  -82.68% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   390.35Mi ± 0%   65.27Mi ± 0%  -83.28% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     80.745Ki ± 0%   5.163Ki ± 0%  -93.61% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    794.59Ki ± 0%   40.78Ki ± 0%  -94.87% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     7924.7Ki ± 0%   391.5Ki ± 0%  -95.06% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    77.343Mi ± 0%   3.785Mi ± 0%  -95.11% (p=0.000 n=10)
geomean                                  4.219Mi        311.7Ki       -92.79%

                                      │   old.txt   │            new_final.txt            │
                                      │  allocs/op  │  allocs/op   vs base                │
Exporter/logs/otel/small_batch-10        553.0 ± 0%    112.0 ± 0%  -79.75% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      5.430k ± 0%   1.018k ± 0%  -81.25% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       54.19k ± 0%   10.07k ± 0%  -81.41% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      541.7k ± 0%   100.5k ± 0%  -81.44% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    4.301k ± 0%   1.567k ± 0%  -63.57% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   42.83k ± 0%   15.49k ± 0%  -63.83% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    428.0k ± 0%   154.8k ± 0%  -63.83% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   4.278M ± 0%   1.546M ± 0%  -63.86% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10      594.0 ± 0%    132.0 ± 0%  -77.78% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    5.830k ± 0%   1.218k ± 0%  -79.11% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     58.19k ± 0%   12.07k ± 0%  -79.26% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    581.7k ± 0%   120.5k ± 0%  -79.28% (p=0.000 n=10)
geomean                                 35.09k        8.569k       -75.58%

scopeMetrics := scopeMetrics.At(j)
scope := scopeMetrics.Scope()
groupedDataPointsByIndex := make(map[string]map[uint32][]dataPoint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: I made it so that documents from different scopes are never merged. This simplified the serialization logic and also fixes a subtle bug in the current implementation where we're only hashing the scope attributes but not the scope name. This leads to grouping of potentially different scopes to the same document. I guess as a consequence, we should also add the scope name as a dimension in the mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by moving this here, rather than outside of the scopeMetrics loop, we're assuming that there will never be two identical scopes within a resource. Is that a safe assumption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's no worse than the existing assumption that resourceMetrics is free of duplicate resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will make it safe is that the only consequence of being wrong in the assumption is leaving some storage savings on the table. In other words, we should prioritize elastic/elasticsearch#99123, which turns out to be more of an issue than anticipated in various contexts.

Copy link
Contributor

@axw axw Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't duplicates resources/scopes lead to duplicate _tsid & doc rejections? Definitely agree on prioritising that issue though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would, until we fix the referenced issue.

@felixbarny felixbarny marked this pull request as ready for review January 10, 2025 07:42
@felixbarny felixbarny requested a review from a team as a code owner January 10, 2025 07:42
@felixbarny felixbarny requested a review from songy23 January 10, 2025 07:42
bytes.Buffer.Write is guaranteed to not return an error
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The amount of handwritten serialisation makes me a little uncomfortable, but we can perhaps improve that with code generation later.

exporter/elasticsearchexporter/pdata_serializer.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/pdata_serializer.go Outdated Show resolved Hide resolved
@felixbarny
Copy link
Contributor Author

felixbarny commented Jan 10, 2025

I've introduced pooling for the buffer holding the serialized events in 5e523c5. This reduced allocation by another 60% and increased throughput as well. I suppose most of the remaining allocations are from creating the pdata model itself, something that we can't easily optimize and not allocations that are directly caused by the ES exporter itself. I've updated the benchmark results in #37032 (comment).

felixbarny and others added 4 commits January 10, 2025 17:58
exporter/elasticsearchexporter has more than one function: "NewBufferPool,NewFactory"
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 92.37875% with 33 lines in your changes missing coverage. Please review.

Project coverage is 79.60%. Comparing base (992d3b0) to head (29e9daf).

Files with missing lines Patch % Lines
exporter/elasticsearchexporter/pdata_serializer.go 94.53% 11 Missing and 6 partials ⚠️
exporter/elasticsearchexporter/exporter.go 68.18% 6 Missing and 8 partials ⚠️
exporter/elasticsearchexporter/model.go 96.92% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #37032      +/-   ##
==========================================
- Coverage   79.60%   79.60%   -0.01%     
==========================================
  Files        2252     2254       +2     
  Lines      211920   212032     +112     
==========================================
+ Hits       168704   168781      +77     
- Misses      37549    37576      +27     
- Partials     5667     5675       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good, just a few nits! This will allow us to remove a lot of workarounds due to OTel mode.

exporter/elasticsearchexporter/bufferpol.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/bufferpol.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package elasticsearchexporter // import "github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Seems that this serializer is a good candidate to be moved to a separate package, and only expose the serialize* funcs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried that in the beginning but there are a few package-private things it needs to access. For example, dataPoint and dataStream* constants in attribute.go, which we'd need to make public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved bufferpool into a separate package now

exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/model.go Outdated Show resolved Hide resolved
@felixbarny
Copy link
Contributor Author

4277f76 and 2328a7a add some more minor optimizations that have an average change to the previous commit of +9% events/s and -6.75% B/op.

I have another change lined up to mark the component with consumer.Capabilities{MutatesData: false}, which avoids the collector having to clone all input data before calling the exporter. But I'd like to get this PR in first, as it's already quite large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants