forked from open-telemetry/opentelemetry-collector-contrib
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore][pkg/stanza/fileconsumer] Emit logs in batches #2
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Sets a specific GID for the build container's image. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue open-telemetry#35179 <!--Describe what testing was performed and which tests were added.--> #### Testing (Manual) ``` $ make docker-otelcontribcol // create a sample config.yaml file $ docker run -v .:/etc/otel/ otelcontribcol $ ps -o user,group,pid,comm -ax | rg otelcontribcol 10001 10001 1903287 otelcontribcol ``` Without the changes: ``` $ ps -o user,group,pid,comm -ax | rg otelcontribcol root root 1940536 otelcontribcol ``` <!--Describe the documentation added.--> #### Documentation <!--Please delete paragraphs that you did not use before submitting.-->
…n ECS mode (open-telemetry#36233) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description In ECS mode: Translate `k8s.job.name`, `k8s.cronjob.name`, `k8s.statefulset.name`, `k8s.replicaset.name`, `k8s.daemonset.name`, `k8s.container.name` to `kubernetes.*.name`. Translate `k8s.cluster.name` to `orchestrator.cluster.name`. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue <!--Describe what testing was performed and which tests were added.--> #### Testing <!--Describe the documentation added.--> #### Documentation <!--Please delete paragraphs that you did not use before submitting.-->
…elemetry#36255) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Incorrect rec.WaitItems call causes test to be flaky. Refactor to avoid future occurrences. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#35924 <!--Describe what testing was performed and which tests were added.--> #### Testing <!--Describe the documentation added.--> #### Documentation <!--Please delete paragraphs that you did not use before submitting.-->
…etry#36260) This makes the code clearer by encapsulating the token's body and attributes in a single structure. It should make future change clearer when the emit callback will be changed to accept a collection of tokens as opposed to a single token. The Sink type could use some refactoring as well, but I'm not doing it here to keep the changes to the minimum for clarity and ease of code review.
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description The connector now does not emit empty batches for invalid otlp payload and throws an error instead. Approach discussed here open-telemetry#35738 (comment) <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#35738 and open-telemetry#35739 <!--Describe what testing was performed and which tests were added.--> #### Testing Manual Testing <!--Describe the documentation added.--> #### Documentation <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
…ch entry still Next step is to change the emit function to accept a batch.
But still send each entry one by one to the next consumer in the file input. Next step is to change Stanza operators to accept batches.
The File input's `emitBatch` function now calls `ProcessBatch` instead of `Process`. The added `ProcessBatch` method will make each Stanza operator capable of accepting a batch of entries.
This changes the Log Emitter to run the `consumerFunc` on the whole batch, instead of splitting the batch into individual entries and calling `consumerFunc` on each of them. This doesn't change much while the Log Emitter has its own `batch` buffer, but if we remove the `batch` buffer (see open-telemetry#35456), this should prevent the performance drop described in open-telemetry#35454.
5492cb4
to
8e3197b
Compare
andrzej-stencel
pushed a commit
that referenced
this pull request
Jan 7, 2025
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description Generates simple histograms using telemetrygen <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes <!--Describe what testing was performed and which tests were added.--> #### Testing Test with a local otel collector with debug output ``` bin/telemetrygen metrics --metrics 5 --otlp-http --otlp-endpoint "localhost:4318" --metric-type Histogram --otlp-insecure ``` Output from debug Exporter: ``` Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0 ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope Metric #0 Descriptor: -> Name: gen -> Description: -> Unit: -> DataType: Histogram -> AggregationTemporality: Cumulative HistogramDataPoints #0 StartTimestamp: 2024-11-13 16:22:50.633365 +0000 UTC Timestamp: 2024-11-13 16:22:51.633367 +0000 UTC Count: 0 Sum: 0.000000 ExplicitBounds #0: 0.000000 ExplicitBounds #1: 1.000000 ExplicitBounds #2: 2.000000 ExplicitBounds #3: 3.000000 ExplicitBounds open-telemetry#4: 4.000000 Buckets #0, Count: 0 Buckets #1, Count: 0 Buckets #2, Count: 0 Buckets #3, Count: 0 Buckets open-telemetry#4, Count: 0 ResourceMetrics #1 Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0 ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope Metric #0 Descriptor: -> Name: gen -> Description: -> Unit: -> DataType: Histogram -> AggregationTemporality: Cumulative HistogramDataPoints #0 StartTimestamp: 2024-11-13 16:22:50.639942 +0000 UTC Timestamp: 2024-11-13 16:22:51.639942 +0000 UTC Count: 1 Sum: 1.000000 ExplicitBounds #0: 0.000000 ExplicitBounds #1: 1.000000 ExplicitBounds #2: 2.000000 ExplicitBounds #3: 3.000000 ExplicitBounds open-telemetry#4: 4.000000 Buckets #0, Count: 0 Buckets #1, Count: 1 Buckets #2, Count: 0 Buckets #3, Count: 0 Buckets open-telemetry#4, Count: 0 ResourceMetrics #2 Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0 ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope Metric #0 Descriptor: -> Name: gen -> Description: -> Unit: -> DataType: Histogram -> AggregationTemporality: Cumulative HistogramDataPoints #0 StartTimestamp: 2024-11-13 16:22:50.6404 +0000 UTC Timestamp: 2024-11-13 16:22:51.640401 +0000 UTC Count: 2 Sum: 4.000000 ExplicitBounds #0: 0.000000 ExplicitBounds #1: 1.000000 ExplicitBounds #2: 2.000000 ExplicitBounds #3: 3.000000 ExplicitBounds open-telemetry#4: 4.000000 Buckets #0, Count: 0 Buckets #1, Count: 1 Buckets #2, Count: 0 Buckets #3, Count: 1 Buckets open-telemetry#4, Count: 0 ResourceMetrics #3 Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0 ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope Metric #0 Descriptor: -> Name: gen -> Description: -> Unit: -> DataType: Histogram -> AggregationTemporality: Cumulative HistogramDataPoints #0 StartTimestamp: 2024-11-13 16:22:50.640729 +0000 UTC Timestamp: 2024-11-13 16:22:51.640729 +0000 UTC Count: 3 Sum: 3.000000 ExplicitBounds #0: 0.000000 ExplicitBounds #1: 1.000000 ExplicitBounds #2: 2.000000 ExplicitBounds #3: 3.000000 ExplicitBounds open-telemetry#4: 4.000000 Buckets #0, Count: 1 Buckets #1, Count: 1 Buckets #2, Count: 1 Buckets #3, Count: 0 Buckets open-telemetry#4, Count: 0 ResourceMetrics open-telemetry#4 Resource SchemaURL: https://opentelemetry.io/schemas/1.13.0 ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope Metric #0 Descriptor: -> Name: gen -> Description: -> Unit: -> DataType: Histogram -> AggregationTemporality: Cumulative HistogramDataPoints #0 StartTimestamp: 2024-11-13 16:22:50.641073 +0000 UTC Timestamp: 2024-11-13 16:22:51.641073 +0000 UTC Count: 4 Sum: 12.000000 ExplicitBounds #0: 0.000000 ExplicitBounds #1: 1.000000 ExplicitBounds #2: 2.000000 ExplicitBounds #3: 3.000000 ExplicitBounds open-telemetry#4: 4.000000 Buckets #0, Count: 0 Buckets #1, Count: 0 Buckets #2, Count: 1 Buckets #3, Count: 2 Buckets open-telemetry#4, Count: 1 {"kind": "exporter", "data_type": "metrics", "name": "debug"} ``` <!--Describe the documentation added.--> #### Documentation <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
andrzej-stencel
pushed a commit
that referenced
this pull request
Jan 31, 2025
#### Description Vulnerability #1: GO-2025-3420 Sensitive headers incorrectly sent after cross-domain redirect in net/http More info: https://pkg.go.dev/vuln/GO-2025-3420 Standard library Found in: net/http@go1.22.8 Fixed in: net/http@go1.22.11 Example traces found: Error: #1: codeowners.go:212:55: githubgen.codeownersGenerator.getGithubMembers calls github.OrganizationsService.ListMembers, which eventually calls http.Client.Do Vulnerability #2: GO-[20](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/13003223509/job/36265594395?pr=37492#step:6:21)25-3373 Usage of IPv6 zone IDs can bypass URI name constraints in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-3373 Standard library Found in: crypto/x509@go1.[22](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/13003223509/job/36265594395?pr=37492#step:6:23).8 Fixed in: crypto/x509@go1.22.11 Example traces found: Related: open-telemetry/opentelemetry-collector#12197
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Modifies the File consumer to emit logs in batches as opposed to sending each log individually through the Stanza pipeline and on to the Log Emitter.
This was achieved via the following incremental changes, with the goal of making code reviews easier (multiple smaller changesets):
Reader::ReadToEnd
method in File consumer, but still callemit
function for each token individuallyemit.Callback
function signature to accept a slice of tokens and emit tokens in batches from theReader
. At this point, the batches are still split into individual tokens inside theemit
function, because the Stanza operators can only process one entry at a time.ProcessBatch
method to Stanza operators and use it in theemit
function. At this point, the batch of tokens is translated to a batch of entries and passed to Log Emitter as a whole. The batch is still split in the Log Emitter, which callsconsumeFunc
for each entry in a loop.consumeFunc
on the whole batch of entriesNote that this is currently a draft, requesting initial feedback. I haven't yet implemented the
ProcessBatch
method for all Stanza operators, as I'd like to first get feedback its definition. Specifically, should the function accept a[]entry.Entry
or[]*entry.Entry
?Link to tracking issue
Testing
No changes in tests. The goal is for the functionality to not change and for performance to not decrease.
Documentation
These are internal changes, no user documentation needs changing.