-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose BPF map kernel memory use by tracing policy #2984
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
30c69ba
to
07ae544
Compare
In order to specify another path than the base path, for example to only look for a specific policy, or sensor, or program thanks to the hierarchical organization of the Tetragon BPF fs. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
07ae544
to
46ef8f4
Compare
Also add _bytes suffix to memlock field for precision. This is mostly for consistency and ease of use. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Introduce a new field that will be used in the next commit to inform the user how much kernel memory is roughly used by the policy. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
4a92d64
to
582b2f8
Compare
2ba13bb
to
e84d69a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great!
Please find some comments below.
For improved readability. Use the JSON output in both commands for accurate results. Now the output is adapted to the nearest sensible unit: [...] NAMESPACE SENSORS KERNELMEMORY [...] (global) generic_kprobe 4.46 MB [...] (global) generic_kprobe 5.20 MB [...] (global) generic_kprobe 3.54 MB [...] (global) generic_kprobe 5.87 MB Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
These operations might eventually be taken care of by cilium/ebpf directly and we can remove this, see 21be59580a51 ("map: add map_extra, memlock, frozen to MapInfo"). I moved this to the bpf package as it made little sense to import pkg/bugtool to just perform those operations on maps. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
We keep the set of maps that are loaded for a program so that we can do accounting on the memory use of those maps. Not doing the accounting immediately allows for more flexibility in computing which part of the memory is actually shared between multiple programs/sensors. Note that we do that at the lower level of the loader because reading the Program.Maps, we might miss some of them (like heap maps for example). Indeed, the one you can list loaded from loadMaps and such are the one that we might pin for: * sharing them amongst programs * using them in userspace Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Retrieve memlock from each map loaded by a sensor. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
0cbc81a
to
08f6962
Compare
Avoid to double count for shared maps, like the execve_maps for example. I decided keep a record of global maps for accounting. At first I implemented it by reading the global BPF fs directory, this might use too much CPU since we should be aware of what are the current global maps since we load them ourselves, so instead I hooked in the loading of global maps. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
This will now be used not only for the state but also for the kernel memory so I'm making the name more generic. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Similarly to TracingPolicy state, you can retrieve this information through the gRPC API by listing the policies or using the metrics interface. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Fix an unexpected panic along. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
So that this code is triggered via tests and doesn't fail silently if cilium/ebpf changes its behavior, like suggested in cilium/ebpf#1566. Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
08f6962
to
cbb1b35
Compare
@@ -63,6 +70,7 @@ func collect(ch chan<- prometheus.Metric) { | |||
for _, policy := range list.Policies { | |||
state := policy.State | |||
counters[state]++ | |||
ch <- policyKernelMemory.MustMetric(float64(policy.KernelMemoryBytes), policy.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to report this metric for policies that are disabled or in error state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh I haven't tried, hopefully it should show up as zero if it's disabled since progs are unloaded and similarly for error 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I tried and the memory state is actually not refreshed properly on disabled, it needs a fix, thanks for the remark.
This allows users to understand how much kernel memory the BPF maps loaded by the program loaded by their TracingPolicy consume. This information can be retrieved from the gRPC API, by using
tetra
for example, or from the metrics server.gRPC API with
tetra
Metrics server using
curl
Note: this PR looks a little bit like #2090.