Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-3338: Introduce known file cache and cleanup throttling in BatchFileOrchestrator in order to reduce the number of syscalls #1506

Merged

Conversation

0xnm
Copy link
Member

@0xnm 0xnm commented Jun 28, 2023

What does this PR do?

The goal of this PR is to reduce CPU consumption during the scenarios like described in #1469.

The issue comes from the fact that when we have many batch files on the disk calling listSortedBatchFiles as-is (which is called several times during getWritableFile, for example) becomes quite expensive because:

  • syscalls (like isFile) are expensive
  • pattern matching becomes expensive as well.

In order to optimize that, the following is done:

  • Introduced the cache for the files already known to be batch files, so that we don't need to call isFile and use regex.
  • Introduce throttling for the cleanup (only for the getWritableFile path, because a) it can be called much more often than getReadableFile, for every event; b) cleanup actions are a bit different between getReadableFile and getWritableFile, so having the same timestamp is not quite correct).

Test scenario: 300 batch files on the disk, disabled network, writing 1 log entry every 100 ms.

CPU usage after the change (same scenario and duration is used):

image

It gives a very good bump if the total number files on the disk is below the cache max size. If cache is not enough, the problem comes back (but maybe in a bit better state, because we have cleanup throttling)

CPU usage before the change:

image

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

…FileOrchestrator in order to reduce the number of syscalls
@0xnm 0xnm force-pushed the nogorodnikov/rumm-3338/optimize-batch-file-orchestrator branch from b203a62 to 9538a69 Compare June 28, 2023 15:44
Copy link
Member

@mariusc83 mariusc83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job ;)

@codecov-commenter
Copy link

codecov-commenter commented Jun 29, 2023

Codecov Report

Merging #1506 (9538a69) into develop (a982b35) will increase coverage by 0.07%.
The diff coverage is 90.00%.

@@             Coverage Diff             @@
##           develop    #1506      +/-   ##
===========================================
+ Coverage    82.00%   82.07%   +0.07%     
===========================================
  Files          363      363              
  Lines        12981    12995      +14     
  Branches      2162     2165       +3     
===========================================
+ Hits         10645    10665      +20     
+ Misses        1679     1676       -3     
+ Partials       657      654       -3     
Impacted Files Coverage Δ
...al/persistence/file/batch/BatchFileOrchestrator.kt 95.00% <88.89%> (+1.30%) ⬆️
...internal/persistence/file/FilePersistenceConfig.kt 88.89% <100.00%> (+1.39%) ⬆️

... and 9 files with indirect coverage changes

@0xnm 0xnm marked this pull request as ready for review June 29, 2023 07:48
@0xnm 0xnm requested a review from a team as a code owner June 29, 2023 07:48
@@ -67,6 +77,7 @@ internal class BatchFileOrchestrator(
}

deleteObsoleteFiles()
lastCleanupTimestamp = System.currentTimeMillis()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the update of lastCleanupTimestamp can be part of the deleteObsoleteFiles() method?

@0xnm 0xnm merged commit ee9fed5 into develop Jun 30, 2023
@0xnm 0xnm deleted the nogorodnikov/rumm-3338/optimize-batch-file-orchestrator branch June 30, 2023 09:04
@xgouchet xgouchet added this to the 2.0.0 milestone Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants