Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve design of internal/stanza log converter #3125

Conversation

pmalek-sumo
Copy link
Contributor

Description: Improve the internal design of internal/stanza log converter.

  • get rid of locks
  • rely purely on channel communication
  • introduce independent units of work: workerLoop, batchLoop, flushLoop
  • introduce configurable worker pool

Testing: existing unit tests pass + added a converter benchmark

@pmalek-sumo
Copy link
Contributor Author

Testbed results comparison:

81dc441:

( make otelcontribcol && cd testbed/ && TESTCASE_DURATION=15s TEST_ARGS="-run TestLog -count 1" ./runtests.sh)
...
# Test PerformanceResults
Started: Thu, 15 Apr 2021 12:07:50 +0200

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     17s|    17.3|    14.7|         40|         49|    149900|        149900|
Log10kDPS/filelog                       |PASS  |     15s|    18.2|    19.3|         41|         49|    150000|        150000|
Log10kDPS/kubernetes_containers         |PASS  |     15s|    44.5|    45.1|         46|         54|    150000|        150000|
Log10kDPS/k8s_CRI-Containerd            |PASS  |     15s|    40.0|    40.2|         46|         54|    150000|        150000|
Log10kDPS/k8s_CRI-Containerd_no_attr_ops|PASS  |     15s|    33.0|    33.4|         45|         53|    150000|        150000|
Log10kDPS/CRI-Containerd                |PASS  |     15s|    19.6|    20.0|         44|         52|    150000|        150000|
Log10kDPS/syslog-udp                    |PASS  |     15s|    59.3|    64.0|         42|         51|    139900|        139900|
Log10kDPS/syslog-tcp-batch-1            |PASS  |     15s|    53.4|    55.5|         42|         50|    149900|        149900|
Log10kDPS/syslog-tcp-batch-100          |PASS  |     15s|    45.9|    47.6|         41|         49|    150000|        150000|
Log10kDPS/FluentBitToOTLP               |PASS  |     23s|     5.0|     8.3|         52|         70|    149900|        149900|
Log10kDPS/FluentForward-SplunkHEC       |PASS  |     16s|    35.0|    35.6|         55|         68|    148800|        148800|

this PR:

Started: Thu, 15 Apr 2021 12:17:23 +0200

Test                                    |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items|
----------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:|
Log10kDPS/OTLP                          |PASS  |     17s|    15.2|    12.7|         38|         47|    149900|        149900|
Log10kDPS/filelog                       |PASS  |     15s|    20.4|    22.2|         42|         49|    150000|        150000|
Log10kDPS/kubernetes_containers         |PASS  |     15s|    45.2|    45.5|         45|         53|    150000|        150000|
Log10kDPS/k8s_CRI-Containerd            |PASS  |     15s|    43.4|    44.8|         45|         54|    150000|        150000|
Log10kDPS/k8s_CRI-Containerd_no_attr_ops|PASS  |     15s|    39.7|    41.0|         44|         52|    150000|        150000|
Log10kDPS/CRI-Containerd                |PASS  |     15s|    27.4|    28.3|         45|         53|    150000|        150000|
Log10kDPS/syslog-udp                    |PASS  |     15s|    58.3|    66.3|         43|         51|    139900|        139900|
Log10kDPS/syslog-tcp-batch-1            |PASS  |     15s|    52.5|    59.1|         41|         49|    149900|        149900|
Log10kDPS/syslog-tcp-batch-100          |PASS  |     15s|    50.7|    53.3|         42|         49|    149900|        149900|
Log10kDPS/FluentBitToOTLP               |PASS  |     23s|     4.9|     7.0|         52|         76|    149900|        149900|
Log10kDPS/FluentForward-SplunkHEC       |PASS  |     15s|    34.0|    34.7|         56|         67|    149600|        149600|

@pmalek-sumo pmalek-sumo marked this pull request as ready for review April 15, 2021 11:19
@pmalek-sumo pmalek-sumo requested a review from djaglowski as a code owner April 15, 2021 11:19
@pmalek-sumo pmalek-sumo requested a review from a team April 15, 2021 11:19
@pmalek-sumo pmalek-sumo force-pushed the improved-logs-translation-workerpool branch 2 times, most recently from 01332b7 to f01f1ac Compare April 19, 2021 09:18
@pmalek-sumo pmalek-sumo force-pushed the improved-logs-translation-workerpool branch from f01f1ac to 4414374 Compare April 19, 2021 15:30
@pmalek-sumo
Copy link
Contributor Author

@djaglowski Let me know whether there's any sense in me rebasing this and resolving conflicts so that we'd like to keep looking at it or reject it straight away.

@djaglowski
Copy link
Member

@pmalek-sumo This design is certainly easier to reason about. It's unfortunate that we lose much of the performance gains from the previous PR, but I think we are still coming in slightly ahead, and perhaps this design gives us a path to further improvement.

One thing I am a little concerned about is testing. Do you think we're sufficiently exercising various configurations?

One thought I have on the converter, is that it's quite difficult to use for testing purposes. It would be nice if there were a way to "statelessly" convert entry.Entry to pdata.Logs, without having to work around implementation details of the converter. I think this would allow us to write test cases more easily.

@pmalek-sumo
Copy link
Contributor Author

One thing I am a little concerned about is testing. Do you think we're sufficiently exercising various configurations?

We can easily add those it depends what you have in mind. This package is transitively tested by other packages which use it so this somewhat makes me more confident that it works hence I didn't spend too much time on testing it.

I'm willing to put more effort if there's will to accept it on this (not only) condition :).

One thought I have on the converter, is that it's quite difficult to use for testing purposes. It would be nice if there were a way to "statelessly" convert entry.Entry to pdata.Logs, without having to work around implementation details of the converter. I think this would allow us to write test cases more easily.

I believe we can add a thin wrapper func on top of convert() which would translate to pdata.Logs as it used to (somewhat) before and use that in tests.

@djaglowski
Copy link
Member

@pmalek-sumo Thanks, I think it makes sense to move forward on this approach. It's looking promising.

@pmalek-sumo pmalek-sumo force-pushed the improved-logs-translation-workerpool branch from 4414374 to 655f40b Compare April 27, 2021 14:42
@pmalek-sumo
Copy link
Contributor Author

@djaglowski Added Convert() for stateless scenarios as you've mentioned.

Not sure what sort of tests we'd like to add. As I've mentioned: this being transitively as a direct dependency of e.g. filelogreceiver or syslogreceiver is rather sufficient.

One small point in this subject I'd like to raise is that if we observe testbed tests to fail we can then address it (of course if need be).

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Thanks @pmalek-sumo, for all the iterations on this one.

@pmalek-sumo
Copy link
Contributor Author

Irrelevant exporter/elasticexporter tests failed on Windows

@bogdandrutu
Copy link
Member

Please rebase

@pmalek-sumo pmalek-sumo force-pushed the improved-logs-translation-workerpool branch from b23197f to bfa9291 Compare May 12, 2021 08:20
@pmalek-sumo
Copy link
Contributor Author

Please rebase

@bogdandrutu Done. I can see that testbed test Log10kDPS/syslog-udp failed but it seems has been randomly failing before this PR 🤷‍♂️

@djaglowski
Copy link
Member

Log10kDPS/syslog-udp failed

Just noting that there is an issue for this already: #3095

For now, I'm rerunning tests.

@bogdandrutu bogdandrutu merged commit 8a42832 into open-telemetry:main May 12, 2021
alexperez52 referenced this pull request in open-o11y/opentelemetry-collector-contrib Aug 18, 2021
Benchmarks Before:

```
goos: darwin
goarch: amd64
pkg: go.opentelemetry.io/collector/processor/batchprocessor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSplitLogs
BenchmarkSplitLogs-16    	   16072	     76529 ns/op	  114408 B/op	     669 allocs/op
PASS
```

Benchmarks After:

```
goarch: amd64
pkg: go.opentelemetry.io/collector/processor/batchprocessor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkSplitLogs
BenchmarkSplitLogs-16    	   21795	     56491 ns/op	   88952 B/op	     556 allocs/op
PASS
```

Benchmarks Reference Clone:

```
goos: darwin
goarch: amd64
pkg: go.opentelemetry.io/collector/processor/batchprocessor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkCloneLogs
BenchmarkCloneLogs-16    	   22305	     52075 ns/op	   85976 B/op	     503 allocs/op
PASS
```

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants