Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter]prometheusremotewrite] Fix data race by introducing pool of batch state #36601

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ArthurSens
Copy link
Member

@ArthurSens ArthurSens commented Nov 29, 2024

Description

This is an alternative for #36524 and #36600

This PR does a couple of things:

  • Add a test written by @edma2 that shows a data race to the batch state when running multiple consumers.
  • Add a benchmark for PushMetrics, with options to run with a stable number of metrics or varying metrics.
  • Fix the data race by introducing a sync.Pool of batch states.

Benchmark results

results comparing main, #36600 and this PR:

arthursens$ benchstat main.txt withoutState.txt syncpool.txt 
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
                            │  main.txt   │          withoutState.txt           │         syncpool.txt         │
                            │   sec/op    │    sec/op     vs base               │   sec/op     vs base         │
PushMetricsVaryingMetrics-2   8.066m ± 5%   13.821m ± 9%  +71.36% (p=0.002 n=6)   8.316m ± 6%  ~ (p=0.065 n=6)

                            │   main.txt   │           withoutState.txt            │            syncpool.txt            │
                            │     B/op     │     B/op       vs base                │     B/op      vs base              │
PushMetricsVaryingMetrics-2   5.216Mi ± 0%   34.436Mi ± 0%  +560.17% (p=0.002 n=6)   5.548Mi ± 0%  +6.36% (p=0.002 n=6)

                            │  main.txt   │       withoutState.txt       │         syncpool.txt         │
                            │  allocs/op  │  allocs/op   vs base         │  allocs/op   vs base         │
PushMetricsVaryingMetrics-2   56.02k ± 0%   56.05k ± 0%  ~ (p=0.721 n=6)   56.04k ± 0%  ~ (p=0.665 n=6)

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
(cherry picked from commit bdeb254)
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
@Aneurysm9
Copy link
Member

This is strange to me. I don't see any degradations in memory allocations by removing the state, so I'm confused if the optimization ever made any difference or if I'm doing something wrong in the benchmark.

The original issue talked about batch sizes of 100-200k data points, so maybe 10k isn't enough to illustrate the issue?

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
(cherry picked from commit 2dc1870)
@ArthurSens
Copy link
Member Author

I've increased the amount of data used in the benchmarks, still no difference.

arthursens$ benchstat main.txt withoutState.txt syncpool.txt 
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
                            │  main.txt   │       withoutState.txt       │         syncpool.txt         │
                            │   sec/op    │   sec/op     vs base         │   sec/op     vs base         │
PushMetricsVaryingMetrics-2   676.0m ± 5%   685.0m ± 8%  ~ (p=0.240 n=6)   682.9m ± 6%  ~ (p=0.310 n=6)

                            │   main.txt   │       withoutState.txt        │         syncpool.txt          │
                            │     B/op     │     B/op      vs base         │     B/op      vs base         │
PushMetricsVaryingMetrics-2   431.0Mi ± 0%   431.2Mi ± 0%  ~ (p=0.937 n=6)   430.9Mi ± 0%  ~ (p=0.132 n=6)

                            │  main.txt   │        withoutState.txt        │           syncpool.txt            │
                            │  allocs/op  │  allocs/op   vs base           │  allocs/op   vs base              │
PushMetricsVaryingMetrics-2   4.334M ± 0%   4.334M ± 0%  ~ (p=1.000 n=6) ¹   4.334M ± 0%  +0.00% (p=0.002 n=6)
¹ all samples are equal
Individual benchmark runs

Main:

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
BenchmarkPushMetricsVaryingMetrics-2   	     100	 641808841 ns/op	451837730 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 680346175 ns/op	452476772 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 673599777 ns/op	451837744 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 674459682 ns/op	451847461 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 677447471 ns/op	452476337 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 687872721 ns/op	452075896 B/op	 4333723 allocs/op
PASS
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter	543.110s

Alternative 1 (#36600):

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
BenchmarkPushMetricsVaryingMetrics-2   	     100	 632321270 ns/op	452227454 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 688034830 ns/op	451837198 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 685106043 ns/op	452075646 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 680661115 ns/op	451837636 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 684886080 ns/op	452237544 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 686308490 ns/op	452876716 B/op	 4333723 allocs/op
PASS
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter	544.852s

This branch:

goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
BenchmarkPushMetricsVaryingMetrics-2   	     100	 641808841 ns/op	451837730 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 680346175 ns/op	452476772 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 673599777 ns/op	451837744 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 674459682 ns/op	451847461 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 677447471 ns/op	452476337 B/op	 4333723 allocs/op
BenchmarkPushMetricsVaryingMetrics-2   	     100	 687872721 ns/op	452075896 B/op	 4333723 allocs/op
PASS
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter	543.110s

@dashpole
Copy link
Contributor

dashpole commented Dec 2, 2024

Seems like a benchmarking problem. Lets try to work with the original author in the other PR before we choose a path forward.

@ArthurSens
Copy link
Member Author

Seems like a benchmarking problem.

That's for sure! I'm struggling to understand why the results look that way :/

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
(cherry picked from commit b5a9c1d)
@ArthurSens
Copy link
Member Author

Okay, I found the problem. I'm creating a large request by creating an OTel metric with millions of Metrics with similar names and attributes within the same ResourceMetrics. This ends up in a very small tsMap here. Even in the test case with 1 million datapoints we only get three entries in tsMap.

If we go a bit further and look at the code of batchTimeSeries, even if one single entry of tsMap holds 1 million datapoints (around 18MB) and the max batchSize is super small. Those 1 million datapoints become one single PRW request:

if sizeOfCurrentBatch+sizeOfSeries >= maxBatchByteSize {
state.nextTimeSeriesBufferSize = max(10, 2*len(tsArray))
wrapped := convertTimeseriesToRequest(tsArray)
requests = append(requests, wrapped)
tsArray = make([]prompb.TimeSeries, 0, min(state.nextTimeSeriesBufferSize, len(tsMap)-i))
sizeOfCurrentBatch = 0
}

That happens because converTimeseriesToRequest is not aware of maxBatchSize. I prefer that we fix this bug in a follow-up PR.

To progress the benchmark here, even with the bug, we can diversify the attributes in the original OTel metrics during the benchmark preparation, leading to millions of entries in tsMap. I can even decrease the amount of series again and it still works :)

@ArthurSens
Copy link
Member Author

Alright, new results comparing main, #36600 and this PR:

arthursens$ benchstat main.txt withoutState.txt syncpool.txt 
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
                            │  main.txt   │          withoutState.txt           │         syncpool.txt         │
                            │   sec/op    │    sec/op     vs base               │   sec/op     vs base         │
PushMetricsVaryingMetrics-2   8.066m ± 5%   13.821m ± 9%  +71.36% (p=0.002 n=6)   8.316m ± 6%  ~ (p=0.065 n=6)

                            │   main.txt   │           withoutState.txt            │            syncpool.txt            │
                            │     B/op     │     B/op       vs base                │     B/op      vs base              │
PushMetricsVaryingMetrics-2   5.216Mi ± 0%   34.436Mi ± 0%  +560.17% (p=0.002 n=6)   5.548Mi ± 0%  +6.36% (p=0.002 n=6)

                            │  main.txt   │       withoutState.txt       │         syncpool.txt         │
                            │  allocs/op  │  allocs/op   vs base         │  allocs/op   vs base         │
PushMetricsVaryingMetrics-2   56.02k ± 0%   56.05k ± 0%  ~ (p=0.721 n=6)   56.04k ± 0%  ~ (p=0.665 n=6)

It shows that introducing a pool of states is the better option here, removing the state would introduce a big performance degradation.

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
@ArthurSens
Copy link
Member Author

I believe the "Check codeowners" CI is not related to the changes here, right?

@dashpole dashpole added the ready to merge Code review completed; ready to merge by maintainers label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/prometheusremotewrite ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants