Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Log converter has significant memory footprint at scale #18411

Closed
cpheps opened this issue Feb 7, 2023 · 2 comments · Fixed by #18467
Closed

[pkg/stanza] Log converter has significant memory footprint at scale #18411

cpheps opened this issue Feb 7, 2023 · 2 comments · Fixed by #18467

Comments

@cpheps
Copy link
Contributor

cpheps commented Feb 7, 2023

Component(s)

pkg/stanza

Describe the issue you're reporting

The log converter logic in pkg/stanza to convert from a stanza entry to a plog Log has a large memory footprint at scale.

Doing some pprof heap dumps it appears the largest memory footprint is from making copies of the plog entries in the aggregationLoop here. It's also observed if the aggregationLoop gets blocked pushing it's logs to the flushChan here that two copies of all the plogs currently in workerItems exists in memory at a time.

I wrote out a possible solution of combining the workerLoops and aggregationLoop in a way where the stanza entries are converted into the final plog payload sent to the flushChan. This eliminated the need to aggregate plogs by resource in a separate goroutine. I ran a test against a 1 million log file sending to GCP for 5 minutes. I used two versions of the collector, the v0.70.0 release and with my condensed loop change. I've included the process memory graphs below collected by the hostmetrics receiver.

Screenshot 2023-02-06 at 3 03 06 PM

The condensed loop change uses significantly less memory over the same period of time. I've also included the pprof svg images that were collect at the end of each run. These show that the copyTo hotspot has been eliminated and less memory is used.

Given these performance increases I'd like to propose opening a PR with the condensed loop change.

Here is the code I first pass of code I wrote to condense the loops.

pprof images:
Collector v0.70.0:
collector_v0 70 0

Condensed loop:
condensed_loop

@cpheps cpheps added the needs triage New item requiring triage label Feb 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

I'm very much in favor of removing complexity from the converter.

It's not clear to me that we really should be batching within the package anyways. We removed all user-facing configuration before declaring filelog as beta. #16452 (comment) is potentially another indication that we would be better emitting immediately.

In any case, I'd be happy to review a PR for your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants