Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve metrics for map and cache sizes #2291

Merged
merged 8 commits into from
Apr 11, 2024

Conversation

lambdanis
Copy link
Contributor

@lambdanis lambdanis commented Apr 3, 2024

See commits for details.

Fixes #1774

Metrics for map and cache sizes are improved:
* tetragon_map_in_use_gauge metric is renamed to tetragon_map_entries and doesn't have total label anymore
* New tetragon_map_capacity metric exposes the BPF maps capacity
* New tetragon_event_cache_entries metric measures the event cache size
* New tetragon_process_cache_size metric measures the process cache size
* New tetragon_process_cache_capacity metric exposes the process cache capacity

Use a more intuitive name.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
tetragon_map_capacity is a constant gauge exposing the capacity of BPF maps.
It's useful to be used together with tetragon_map_entries to monitor the map
pressure.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
The map capacity is now exaposed by tetragon_map_capacity metric. A separate
metric is much easier to use in queries than a label on tetragon_map_entries
metric.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis lambdanis added area/metrics Related to prometheus metrics release-note/minor This PR introduces a minor user-visible change labels Apr 3, 2024
@lambdanis lambdanis requested review from mtardy and a team as code owners April 3, 2024 22:08
Copy link

netlify bot commented Apr 3, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 9cc7fd1
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/660dd3673778400008df7965
😎 Deploy Preview https://deploy-preview-2291--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@lambdanis lambdanis changed the title Improve metric for map and cache sizes Improve metrics for map and cache sizes Apr 3, 2024
Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
There used to be a wrapper collector defined in the mapmetrics package. It was
needed because there were a few packages (mis)using metrics defined there. Now
it's cleaned up and only observer uses map metrics, so the wrapper collector is
no longer needed.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis lambdanis force-pushed the pr/lambdanis/map-metrics branch from 9cc7fd1 to 0b99985 Compare April 3, 2024 22:15
tetragon_event_cache_entries is a gauge exposing the event cache size.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
There is a metric (tetragon_process_cache_size) defined in the process package,
but it wasn't registered in the Prometheus registry, so it wasn't exposed. Fix
this and register the metric.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
tetragon_process_cache_capacity exposes the capacity of the process cache. It's
useful to be used together with tetragon_process_cache_size to monitor the
process cache utilization.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis lambdanis force-pushed the pr/lambdanis/map-metrics branch from 0b99985 to e02d845 Compare April 3, 2024 22:21
Copy link
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is a nice package of fixes!

@lambdanis lambdanis merged commit 2762272 into cilium:main Apr 11, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Related to prometheus metrics release-note/minor This PR introduces a minor user-visible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix monitoring BPF maps
2 participants