Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: changed internal architecture to avoid high memory usage #33

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gabrik
Copy link

@gabrik gabrik commented Dec 4, 2024

Based on the finding form #9.

RAM usage (10minutes, same load on the process, same log level)

before:
Screenshot 2024-12-04 at 15 49 29

logs on loki:
Screenshot 2024-12-04 at 15 52 07

after:

Screenshot 2024-12-04 at 15 37 06 logs on loki: Screenshot 2024-12-04 at 15 53 16

its around 1/10th with this patch, for the same amount of data pushed to Loki.
I replaced tokio::mpsc with flume because the latter does not take &mut self in the try_recv().
The backoff time is configurable, I can make also the channel size configurable channel size configurable, still via the builder.

I introduced an API breaking change task::spawn(task.start() instead of task::spawn(task), but I think I can make it compatible just a matter of playing a bit with the builder. Introduced BackgroudTaskFuture to deal with it.

I can provide macOS Instrument traces if needed, did not upload them as they are 100s of MBs.

Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
@gabrik gabrik mentioned this pull request Dec 4, 2024
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
@hrxi
Copy link
Owner

hrxi commented Dec 12, 2024

So from what I gather from a high-level architecture point of view, this polls the queue at a configured interval and sends all available data to Loki. Can you explain why you chose this design? To my understanding, this also means that if more than 512 log messages (DEFAULT_CHANNEL_CAP) are sent in 500 ms (DEFAULT_BACKGROUD_TASK_BACKOFF), then all further log messages in that interval are (silently) dropped.

Previously, the background task sent off logs immediately when they were produced, if nothing else was currently being sent, or immediately after the current send. This means that the data was stored in the `

The PR (currently) doesn't do error checking of the HTTPS request, drops outstanding messages when the quit message is received, removes the exponential backoff, all of which can be fixed once we figure out the correct architecture.

I'm going to try to figure out why you see the increased memory usage with the old design…

@hrxi
Copy link
Owner

hrxi commented Dec 12, 2024

(Sorry, I seem to have forgotten to press the send button.)

@gabrik
Copy link
Author

gabrik commented Dec 12, 2024

So from what I gather from a high-level architecture point of view, this polls the queue at a configured interval and sends all available data to Loki. Can you explain why you chose this design? To my understanding, this also means that if more than 512 log messages (DEFAULT_CHANNEL_CAP) are sent in 500 ms (DEFAULT_BACKGROUD_TASK_BACKOFF), then all further log messages in that interval are (silently) dropped.

Yes, we can log that they are dropped but nor more that that, but both channel size and sleep period can be configured, so its the user to figure out its use-case.

Previously, the background task sent off logs immediately when they were produced, if nothing else was currently being sent, or immediately after the current send. This means that the data was stored in the `
The PR (currently) doesn't do error checking of the HTTPS request, drops outstanding messages when the quit message is received, removes the exponential backoff, all of which can be fixed once we figure out the correct architecture.

If there is an HTTP error and we keep the messages the memory will grow, so sooner or later they need to be drop, so better to drop immediately, an error log can be emitted.

I'm going to try to figure out why you see the increased memory usage with the old design…

Well seems that messages where never dropped even if they where sent, not sure why tbh.

If you are able to fix the memory issues with the current (main) architecture I'm fine to test it.

@hrxi
Copy link
Owner

hrxi commented Dec 12, 2024

[…] Can you explain why you chose this design? […]

[…]

Basically, I'm interested why you chose this particular design, to understand whether it's worthwhile changing to it.

E.g. I wouldn't change the design solely to fix a bug with the old implementation, I'd prefer to fix the implementation instead.

@gabrik
Copy link
Author

gabrik commented Dec 12, 2024

[…] Can you explain why you chose this design? […]

[…]

Basically, I'm interested why you chose this particular design, to understand whether it's worthwhile changing to it.

It more straight forward compare to current (main) design, that's the only reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants