-
Notifications
You must be signed in to change notification settings - Fork 0
eBPF profiling
- [atr] code examples: https://github.com/animeshtrivedi/ebpf-example
- https://github.com/iovisor/bcc/tree/master/examples
- https://www.brendangregg.com/ebpf.html
- Perf with eBPF, https://www.brendangregg.com/perf.html#eBPF
- BCC events reference guide: https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md
- Python tools example: https://github.com/iovisor/bcc/tree/master/examples/tracing
- [ntehrany] setup bpftrace, https://github.com/nicktehrany/notes/wiki/bpftrace
Ubuntu: sudo apt-get install bpfcc-tools
(to get all the *-bpfcc
tools, so from https://github.com/iovisor/bcc/tree/master/examples/tracing become XXX-bpfcc
in home.
fatal error: readline/readline.h
sudo apt-get install libreadline-dev
#Then
DESCEND runqslower
Couldn't find kernel BTF; set VMLINUX_BTF to specify its location.
make[1]: *** [Makefile:77: /home/animesh.trivedi/src/linux/tools/bpf/runqslower/.output//vmlinux.h] Error 1
make: *** [Makefile:122: runqslower] Error 2
Remove the pre-installed packages: https://github.com/iovisor/bcc/issues/3993#issuecomment-1228217609
apt purge bpfcc-tools libbpfcc python3-bpfcc
wget https://github.com/iovisor/bcc/releases/download/v0.25.0/bcc-src-with-submodule.tar.gz
tar xf bcc-src-with-submodule.tar.gz
cd bcc/
apt install -y python-is-python3
apt install -y bison build-essential cmake flex git libedit-dev libllvm11 llvm-11-dev libclang-11-dev zlib1g-dev libelf-dev libfl-dev python3-distutils
apt install -y checkinstall
# This you can follow the instruction below
mkdir build
cd build/
cmake -DCMAKE_INSTALL_PREFIX=/usr -DPYTHON_CMD=python3 ..
make
checkinstall
https://github.com/iovisor/bcc/blob/master/INSTALL.md#ubuntu---source
On Ubuntu 24 (make sure to use the llvm 18)
sudo apt install -y zip bison build-essential cmake flex git libedit-dev \
libllvm16 llvm-18-dev libclang-18-dev python3 zlib1g-dev libelf-dev libfl-dev python3-setuptools \
liblzma-dev libdebuginfod-dev arping netperf iperf libpolly-18-dev python-is-python3
Then clone and install
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake ..
make
sudo make install
https://blogs.oracle.com/linux/post/taming-tracepoints-in-the-linux-kernel
# show available events
sudo cat /sys/kernel/debug/tracing/available_events
atr@cordova:~$ sudo ls -l /sys/kernel/debug/tracing/events/ | wc -l
151
atr@cordova:~$ sudo cat /sys/kernel/debug/tracing/available_events | wc -l
2618
# There is a bit of difference in how many events have format directory
# showing the format. OK, it seems like there is a recursive directory structure where events are grouped together
sudo cat /sys/kernel/debug/tracing/events/xhci-hcd/xhci_setup_device/format
https://github.com/anakryiko/bpf-ringbuf-examples/tree/main
I am taking the size as an example: see bitehist.py
file in the bcc github. https://github.com/iovisor/bcc/blob/master/examples/tracing/bitehist.py
atr@f20u24:~/src/ebpf-probes-traces$ sudo /usr/share/bcc/tools//funclatency -d 10 memset_probe2
Tracing 1 functions for "memset_probe2"... Hit Ctrl-C to end.
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 849777 |**** |
512 -> 1023 : 7156535 |****************************************|
1024 -> 2047 : 12235 | |
2048 -> 4095 : 69 | |
4096 -> 8191 : 1394 | |
8192 -> 16383 : 801 | |
16384 -> 32767 : 74 | |
32768 -> 65535 : 20 | |
65536 -> 131071 : 12 | |
131072 -> 262143 : 1 | |
262144 -> 524287 : 2 | |
524288 -> 1048575 : 3 | |
avg = 572 nsecs, total: 4590811412 nsecs, count: 8021354
Detaching...
list the kernel functions that can be probed:
less /proc/kallsyms
There are function names with .constprop
or __pfx
names. What do the symbols means:
https://people.redhat.com/~jolawren/klp-compiler-notes/livepatch/compiler-considerations.html
What to do about them? https://github.com/iovisor/bcc/issues/4261
If not changing static inline void to void would resolve this.
On my own OOT nullblk, this did work.
dump CPU profiles with fio
sudo profile-bpfcc -p `pidof -d, fio` -F 99 10 &> fast_stack
Histogram:
atr@u24clean:~/tmp$ sudo cpudist-bpfcc -O -p 6271 10 1 2>/dev/null
Tracing off-CPU time... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 100 | |
2 -> 3 : 112 | |
4 -> 7 : 20752 | |
8 -> 15 : 1342784 |****************************************|
16 -> 31 : 12664 | |
32 -> 63 : 454 | |
64 -> 127 : 143 | |
128 -> 255 : 83 | |
256 -> 511 : 3 | |
512 -> 1023 : 1 | |
atr@u24clean:~/tmp$ sudo cpudist-bpfcc -O -p 6290 10 1 2>/dev/null
Tracing off-CPU time... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 1298518 |**************** |
2 -> 3 : 3098098 |****************************************|
4 -> 7 : 34802 | |
8 -> 15 : 7021 | |
16 -> 31 : 564 | |
32 -> 63 : 36 | |
64 -> 127 : 8 | |
128 -> 255 : 6 | |
256 -> 511 : 11 | |
512 -> 1023 : 1 | |
here is an example of fio process. -d,
uses ',' as delimiter of pidof
output.
sudo profile-bpfcc -p `pidof -d, fio` -F 99 10 &> fast_stack
/usr/src/linux-6.9.0-atr-2024-07-05/tools/workqueue$ ./wq_dump.py
It seems like when perf
is compiled from source it does not include eBPF tracepoint events.
on node2, 5.17.59
. Also sudo
gives a different list than the normal user.
zebin@node2:~$ sudo perf list sched:*
List of pre-defined events (to be used in -e):
sched:sched_kthread_stop [Tracepoint event]
sched:sched_kthread_stop_ret [Tracepoint event]
sched:sched_kthread_work_execute_end [Tracepoint event]
sched:sched_kthread_work_execute_start [Tracepoint event]
sched:sched_kthread_work_queue_work [Tracepoint event]
sched:sched_migrate_task [Tracepoint event]
sched:sched_move_numa [Tracepoint event]
sched:sched_pi_setprio [Tracepoint event]
sched:sched_process_exec [Tracepoint event]
sched:sched_process_exit [Tracepoint event]
sched:sched_process_fork [Tracepoint event]
sched:sched_process_free [Tracepoint event]
sched:sched_process_hang [Tracepoint event]
sched:sched_process_wait [Tracepoint event]
sched:sched_stat_blocked [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_wait [Tracepoint event]
sched:sched_stick_numa [Tracepoint event]
sched:sched_swap_numa [Tracepoint event]
sched:sched_switch [Tracepoint event]
sched:sched_wait_task [Tracepoint event]
sched:sched_wake_idle_without_ipi [Tracepoint event]
sched:sched_wakeup [Tracepoint event]
sched:sched_wakeup_new [Tracepoint event]
sched:sched_waking [Tracepoint event]
zebin@node2:~$ sudo perf list syscalls:*
List of pre-defined events (to be used in -e):
syscalls:sys_enter_accept [Tracepoint event]
syscalls:sys_enter_accept4 [Tracepoint event]
syscalls:sys_enter_access [Tracepoint event]
syscalls:sys_enter_acct [Tracepoint event]
syscalls:sys_enter_add_key [Tracepoint event]
syscalls:sys_enter_adjtimex [Tracepoint event]
syscalls:sys_enter_alarm [Tracepoint event]
syscalls:sys_enter_arch_prctl [Tracepoint event]
syscalls:sys_enter_bind [Tracepoint event]
syscalls:sys_enter_bpf [Tracepoint event]
syscalls:sys_enter_brk [Tracepoint event]
syscalls:sys_enter_capget [Tracepoint event]
...
https://www.brendangregg.com/blog/2014-07-03/perf-counting.html
zebin@node2:~$ sudo perf stat -e 'syscalls:sys_enter_*' -a sleep 5 | awk '{sum+=$1}; END {print sum}'
Performance counter stats for 'system wide':
3 syscalls:sys_enter_socket
0 syscalls:sys_enter_socketpair
0 syscalls:sys_enter_bind
0 syscalls:sys_enter_listen
1 syscalls:sys_enter_accept4
0 syscalls:sys_enter_accept
3 syscalls:sys_enter_connect
0 syscalls:sys_enter_getsockname
0 syscalls:sys_enter_getpeername
It generates the output but does not summarizes.
https://kubernetes.io/blog/2017/12/using-ebpf-in-kubernetes/
https://lwn.net/Articles/740157/
While reading bpftrace
:
- Reference guide: https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md
- Manual: https://github.com/iovisor/bpftrace/blob/master/man/adoc/bpftrace.adoc
- make sure headers are installed. The 5.12 kernel I compiled is missing headers.
atr@node1:~$ sudo bpftrace --version
bpftrace v0.9.4
atr@node1:~$ which bpftrace
/usr/bin/bpftrace
atr@node1:~$
Example small run:
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_nanosleep { printf("%s is sleeping.\n", comm); }'
-e
flag is for what to execute. Uses the same awk type execution profile.
bpftrace -l '*sleep*'
atr@node1:~$ sudo bpftrace -lv tracepoint:syscalls:sys_enter_nanosleep
tracepoint:syscalls:sys_enter_nanosleep
int __syscall_nr;
struct __kernel_timespec * rqtp;
struct __kernel_timespec * rmtp;
atr@node1:~$
Question: comm
where does this come from? Looks like it says it is one of the builtins. Yes it is, see this: https://github.com/iovisor/bpftrace/blob/master/man/adoc/bpftrace.adoc#builtins
So the tracepoints have a clear signature and are well maintained. kprobes are not. There you need to look into the function signature and use that.
bpftrace --include ./header.h
bpftrace --I ./folder/
Filter out small file reads or "X" bytes
bpftrace -e 'kprobe:vfs_read /arg2 == 512/ { printf("%s small read: %d byte buffer\n", comm, arg2); }'
vfs_read
signature for v5.12 kernel: https://elixir.bootlin.com/linux/v5.12.19/source/fs/read_write.c#L476
The second argument is the count, hence this is where we are filtering. The arg count starts from 0.
Now I want to filter on the process name, use the builtin comm
name:
bpftrace -e 'kprobe:vfs_read /comm == "my_name"/ { printf("%s small read: %d byte buffer\n", comm, arg2); }'
Use args->
construct.
root@node1:/home/atr# bpftrace -lv tracepoint:syscalls:sys_enter_openat
tracepoint:syscalls:sys_enter_openat
int __syscall_nr;
int dfd;
const char * filename;
int flags;
umode_t mode;
root@node1:/home/atr#
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'
Attaching 1 probe...
snmpd /proc/diskstats
snmpd /proc/stat
snmpd /proc/vmstat
Include the header file
# cat path.bt
#include <linux/path.h>
#include <linux/dcache.h>
kprobe:vfs_open
{
printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name));
}
# bpftrace path.bt
Attaching 1 probe...
open path: dev
open path: if_inet6
open path: retrans_time_ms
[...]
- Kprobe kernel documentation: https://www.kernel.org/doc/Documentation/kprobes.txt
- What is the difference between bpftrace and bpftool? bpftool is missing on the node1, dont know why.