Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc for node-agent memory preserve #8167

Conversation

Lyndon-Li
Copy link
Contributor

Partially fix issue #8138, add doc for node-agent memory preserve

blackpiglet
blackpiglet previously approved these changes Aug 30, 2024
Copy link

codecov bot commented Aug 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.10%. Comparing base (3408ffe) to head (43de32a).
Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8167      +/-   ##
==========================================
+ Coverage   59.05%   59.10%   +0.04%     
==========================================
  Files         364      365       +1     
  Lines       30324    30336      +12     
==========================================
+ Hits        17909    17931      +22     
+ Misses      10972    10962      -10     
  Partials     1443     1443              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -641,6 +641,16 @@ Both the uploader and repository consume remarkable CPU/memory during the backup
Velero node-agent uses [BestEffort as the QoS][14] for node-agent pods (so no CPU/memory request/limit is set), so that backups/restores wouldn't fail due to resource throttling in any cases.
If you want to constraint the CPU/memory usage, you need to [customize the resource limits][15]. The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][16] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.

For Kopia path, some memory is preserved by the node-agent to avoid frequent memory allocations, therefore, after you run a file-system backup/restore, you won't see node-agent releases all the memory. There is a limit for the memory preservation, so the memory won't increase all the time. The limit varies from the number of CPU cores in the cluster nodes, as calculated below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify how much if at all is released? Should there be timeout for preserved memory? If you only backup once every 6 months, you may rather spend time to reallocate memory next backup.

Copy link
Contributor Author

@Lyndon-Li Lyndon-Li Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the clarification that there is no timeout for the preserved memory, so you won't see node-agent releases all the memory until it restarts.

Copy link
Contributor Author

@Lyndon-Li Lyndon-Li Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify how much if at all is released?

The released memory is unknown actually, because released memory = total allocated memory - preserved memory. While for total allocated memory, we've already clarified as below:
The CPU/memory consumption is always related to the scale of data to be backed up/restored, refer to [Performance Guidance][16] for more details, so it is highly recommended that you perform your own testing to find the best resource limits for your data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be timeout for preserved memory?

Yes, it is more rational to have a smarter mechanism instead of preserve the memory forever. But whether a timeout is ideal enough, we need to further consider.
At present, we just document it and leave it as is. We think it is not a high priority task, reasons:

  1. It happens to fs-backup only from 1.15 on, because data movers will not run in the long-running node-agent pods.
  2. The preserved memory won't reach to the limit very easily, normally it is less than the limit
  3. The backup is usually a scheduled task, e.g., one/several per day, so the preserved memory is normally effective

blackpiglet
blackpiglet previously approved these changes Sep 2, 2024
Signed-off-by: Lyndon-Li <lyonghui@vmware.com>
@shubham-pampattiwar shubham-pampattiwar merged commit a19cf56 into vmware-tanzu:main Sep 9, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants