fix: changed internal architecture to avoid high memory usage #33

gabrik · 2024-12-04T14:56:12Z

Based on the finding form #9.

RAM usage (10minutes, same load on the process, same log level)

before:

logs on loki:

after:

logs on loki:

its around 1/10th with this patch, for the same amount of data pushed to Loki.
I replaced tokio::mpsc with flume because the latter does not take &mut self in the try_recv().
The backoff time is configurable, ~~I can make also the channel size configurable~~ channel size configurable, still via the builder.

~~I introduced an API breaking change task::spawn(task.start() instead of task::spawn(task), but I think I can make it compatible just a matter of playing a bit with the builder.~~ Introduced BackgroudTaskFuture to deal with it.

I can provide macOS Instrument traces if needed, did not upload them as they are 100s of MBs.

Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>

hrxi · 2024-12-12T10:54:07Z

So from what I gather from a high-level architecture point of view, this polls the queue at a configured interval and sends all available data to Loki. Can you explain why you chose this design? To my understanding, this also means that if more than 512 log messages (DEFAULT_CHANNEL_CAP) are sent in 500 ms (DEFAULT_BACKGROUD_TASK_BACKOFF), then all further log messages in that interval are (silently) dropped.

Previously, the background task sent off logs immediately when they were produced, if nothing else was currently being sent, or immediately after the current send. This means that the data was stored in the `

The PR (currently) doesn't do error checking of the HTTPS request, drops outstanding messages when the quit message is received, removes the exponential backoff, all of which can be fixed once we figure out the correct architecture.

I'm going to try to figure out why you see the increased memory usage with the old design…

hrxi · 2024-12-12T10:54:20Z

(Sorry, I seem to have forgotten to press the send button.)

gabrik · 2024-12-12T11:06:55Z

So from what I gather from a high-level architecture point of view, this polls the queue at a configured interval and sends all available data to Loki. Can you explain why you chose this design? To my understanding, this also means that if more than 512 log messages (DEFAULT_CHANNEL_CAP) are sent in 500 ms (DEFAULT_BACKGROUD_TASK_BACKOFF), then all further log messages in that interval are (silently) dropped.

Yes, we can log that they are dropped but nor more that that, but both channel size and sleep period can be configured, so its the user to figure out its use-case.

Previously, the background task sent off logs immediately when they were produced, if nothing else was currently being sent, or immediately after the current send. This means that the data was stored in the `
The PR (currently) doesn't do error checking of the HTTPS request, drops outstanding messages when the quit message is received, removes the exponential backoff, all of which can be fixed once we figure out the correct architecture.

If there is an HTTP error and we keep the messages the memory will grow, so sooner or later they need to be drop, so better to drop immediately, an error log can be emitted.

I'm going to try to figure out why you see the increased memory usage with the old design…

Well seems that messages where never dropped even if they where sent, not sure why tbh.

If you are able to fix the memory issues with the current (main) architecture I'm fine to test it.

hrxi · 2024-12-12T11:11:27Z

[…] Can you explain why you chose this design? […]

[…]

Basically, I'm interested why you chose this particular design, to understand whether it's worthwhile changing to it.

E.g. I wouldn't change the design solely to fix a bug with the old implementation, I'd prefer to fix the implementation instead.

gabrik · 2024-12-12T11:43:48Z

[…] Can you explain why you chose this design? […]

[…]

Basically, I'm interested why you chose this particular design, to understand whether it's worthwhile changing to it.

It more straight forward compare to current (main) design, that's the only reason.

fix: changed internal architecture to avoid high memory usage

25275c1

Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>

gabrik mentioned this pull request Dec 4, 2024

RAM usage #32

Open

gabrik added 3 commits December 4, 2024 16:00

chore: fix docstring example

c8bc943

Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>

feat(channel-size): adding channel size configuration in builder

47e30f4

Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>

fix: remove API change for BackgroudTask

217e18a

Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: changed internal architecture to avoid high memory usage #33

fix: changed internal architecture to avoid high memory usage #33

gabrik commented Dec 4, 2024 •

edited

Loading

hrxi commented Dec 12, 2024

hrxi commented Dec 12, 2024

gabrik commented Dec 12, 2024

hrxi commented Dec 12, 2024 •

edited

Loading

gabrik commented Dec 12, 2024

fix: changed internal architecture to avoid high memory usage #33

Are you sure you want to change the base?

fix: changed internal architecture to avoid high memory usage #33

Conversation

gabrik commented Dec 4, 2024 • edited Loading

hrxi commented Dec 12, 2024

hrxi commented Dec 12, 2024

gabrik commented Dec 12, 2024

hrxi commented Dec 12, 2024 • edited Loading

gabrik commented Dec 12, 2024

gabrik commented Dec 4, 2024 •

edited

Loading

hrxi commented Dec 12, 2024 •

edited

Loading