-
Notifications
You must be signed in to change notification settings - Fork 61
kretprobe on tcp_v4_connect has non-zero nmissed field #24
Comments
How did you tests this? I don't see "every time I start tcptracer-bpf". I do see miss-hits, but they seem to happen only once after kernel boot (as far as I see
^ shows 24 misses (seemingly reliable). |
I does get reset. The high number of hits then stems from the offset guess phase. |
I printed how the nmissed increases during the initialization process: It seems to increase whenever I was fearing it could cause weaveworks/scope#2379 but it seems unrelated since weaveworks/scope#2379 fails before the "network namespace" step. |
@schu added some
In this example, For each kprobe, there should be a kretprobe. For At the same time, I got logs from
tcptracer-bpf's guessing code does 178 connections before it fails. I notice that there are 4 successive kprobes with only 1 kretprobe. This is the problem. We reproduced it only on the GCE instance with 4.4.0-66-generic and only one "possible" cpu (cpu#0):
In this case, we should only have
At a first glance, I don't see know if it is possible to configure a kprobe with a higher maxactive. @iaguis is looking into that. |
This explains why I cannot reproduce the issue on my laptop: I have 8 "possible" cpus on my laptop ( |
From what I see in the kernel, there's no way to configure maxactive unless you set it explicitly in This is the output of searching for
I only see the place where it's set (see previous comment), an example of kernel module where it's set directly in |
We just asked about this in iovisor-dev: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000694.html |
If the kretprobe for tcp_v4_connect() is configured with a too-low maxactive, some kretprobe might be missing. In this case, we detect it and try again. This is more likely to happen on a single-core VM with a non-preemptive kernel (CONFIG_PREEMPT not set) because maxactive would be 1 in that case. See weaveworks#24 Based on work from @iaguis.
We found out we were losing kretprobes sometimes because maxactive was set too low[1][1]. This problem was more apparent in our GCE test environment because the kernel was configured with `CONFIG_PREEMPT` not set and we're running single-core VMs (see weaveworks/tcptracer-bpf#24 for more details). Unfortunately, we can't set the maxactive explicitly from userspace. Alban submitted a kernel patch to allow this[2][2]. This bumps tcptracer-bpf to include a workaround[3][3] for this issue in the guess-offsets phase. [1]: weaveworks/tcptracer-bpf#24 [2]: https://lkml.org/lkml/2017/3/28/629 [3]: weaveworks/tcptracer-bpf#33
We found out we were losing kretprobes sometimes because maxactive was set too low[1]. This problem was more apparent in our GCE test environment because the kernel was configured with `CONFIG_PREEMPT` not set and we're running single-core VMs (see weaveworks/tcptracer-bpf#24 for more details). Unfortunately, we can't set the maxactive explicitly from userspace. Alban submitted a kernel patch to allow this[2]. This bumps tcptracer-bpf to include a workaround[3] for this issue in the guess-offsets phase. [1]: weaveworks/tcptracer-bpf#24 [2]: https://lkml.org/lkml/2017/3/28/629 [3]: weaveworks/tcptracer-bpf#33
Add UDP tests as well as fix bug calculating bytes received
When a kretprobe is installed on a kernel function, there is a maximum limit of how many calls in parallel it can catch (aka "maxactive"). In the case of a eBPF kretprobe, the maxactive is let to the default as defined in kernel/kprobes.c:
We can check if some kretprobes are ever missed:
Every time I start tcptracer-bpf, it misses 28 kretprobes on tcp_v4_connect. I have not noticed any missing "connect" events but this is surprising. I don't know if it is a real bug or if
kprobe_profile
reports a false positive for some reason.The text was updated successfully, but these errors were encountered: