Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

kretprobe on tcp_v4_connect has non-zero nmissed field #24

Open
alban opened this issue Feb 23, 2017 · 7 comments
Open

kretprobe on tcp_v4_connect has non-zero nmissed field #24

alban opened this issue Feb 23, 2017 · 7 comments

Comments

@alban
Copy link
Contributor

alban commented Feb 23, 2017

When a kretprobe is installed on a kernel function, there is a maximum limit of how many calls in parallel it can catch (aka "maxactive"). In the case of a eBPF kretprobe, the maxactive is let to the default as defined in kernel/kprobes.c:

        /* Pre-allocate memory for max kretprobe instances */
        if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
                rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());
#else
                rp->maxactive = num_possible_cpus();
#endif
        }

We can check if some kretprobes are ever missed:

# cat /sys/kernel/debug/tracing/kprobe_profile | grep -vw '0$'
  rtcp_v4_connect                                         1650              28

Every time I start tcptracer-bpf, it misses 28 kretprobes on tcp_v4_connect. I have not noticed any missing "connect" events but this is surprising. I don't know if it is a real bug or if kprobe_profile reports a false positive for some reason.

@schu
Copy link
Contributor

schu commented Feb 24, 2017

Every time I start tcptracer-bpf, it misses 28 kretprobes on tcp_v4_connect.

How did you tests this? I don't see "every time I start tcptracer-bpf". I do see miss-hits, but they seem to happen only once after kernel boot (as far as I see kprobe_profile doesn't get reset)?

diff --git a/semaphore.sh b/semaphore.sh
index db07459..09488bb 100755
--- a/semaphore.sh
+++ b/semaphore.sh
@@ -69,7 +69,8 @@ for kernel_version in "${kernel_versions[@]}"; do
         mount -t tmpfs tmpfs /tmp &&
         mount -t debugfs debugfs /sys/kernel/debug/ &&
         make &&
-        ./run'
+        ./run &&
+        cat /sys/kernel/debug/tracing/kprobe_profile | grep -vw "0$"'
 
     # Determine exit code from pod status due to
     # https://github.com/coreos/rkt/issues/2777

^ shows 24 misses (seemingly reliable).

@schu
Copy link
Contributor

schu commented Feb 24, 2017

as far as I see kprobe_profile doesn't get reset

I does get reset. The high number of hits then stems from the offset guess phase.

@alban
Copy link
Contributor Author

alban commented Mar 23, 2017

I printed how the nmissed increases during the initialization process:

https://gist.githubusercontent.com/alban/3509b952b69f0ed608ccd6a2f44caa65/raw/2348460451bf00ed1245548d4134e019af96a06e/gistfile1.txt

It seems to increase whenever bpf_probe_read() returns EFAULT (when guessing the network namespace offset).

I was fearing it could cause weaveworks/scope#2379 but it seems unrelated since weaveworks/scope#2379 fails before the "network namespace" step.

@alban
Copy link
Contributor Author

alban commented Mar 23, 2017

@schu added some bpf_trace_printk() in kprobe__tcp_v4_connect() and kretprobe__tcp_v4_connect() to see how they are called during the offset work:

scope-19720 [000] d...   497.511920: : kprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.512012: : kretprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.512459: : kprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.512555: : kretprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.513016: : kprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.513100: : kretprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.513538: : kprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19720 [000] d...   497.513703: : kretprobe__tcp_v4_connect - pid_tgid 84662395358472
 curl-19834 [000] d...   497.516090: : kprobe__tcp_v4_connect - pid_tgid 85186381368698
scope-19720 [000] d...   497.516345: : kprobe__tcp_v4_connect - pid_tgid 84662395358472
scope-19821 [000] d...   497.519377: : kprobe__tcp_v4_connect - pid_tgid 84662395358573
scope-19821 [000] d...   497.521792: : kprobe__tcp_v4_connect - pid_tgid 84662395358573
 curl-19834 [000] d...   497.529933: : kretprobe__tcp_v4_connect - pid_tgid 85186381368698
scope-19790 [000] d...   497.624384: : kprobe__tcp_v4_connect - pid_tgid 84658100391246
scope-19790 [000] dN..   497.624592: : kretprobe__tcp_v4_connect - pid_tgid 84658100391246

In this example, pid_tgid=84662395358472 is a thread of Scope that does the guessing work. There are other threads of Scope that perform connections at the same time and also other processes (e.g. curl above).

For each kprobe, there should be a kretprobe. For pid_tgid=84662395358472, this is the case until the ~178th connection.

At the same time, I got logs from offsetguess.go

rtcp_v4_connect   174               0
source port
rtcp_v4_connect   175               0
source port
rtcp_v4_connect   176               0
source port
rtcp_v4_connect   177               0
source port
rtcp_v4_connect   178               0
<probe> WARN: 2017/03/23 13:05:00.367957 Error setting up the eBPF tracker, falling back to proc scanning: failed to init perf map for IPv4 events: error guessing offsets: invalid guessing state while guessing source port, got checking expected checked, status.offset_saddr: 4

tcptracer-bpf's guessing code does 178 connections before it fails.

I notice that there are 4 successive kprobes with only 1 kretprobe. This is the problem.

We reproduced it only on the GCE instance with 4.4.0-66-generic and only one "possible" cpu (cpu#0):

# cat /sys/devices/system/cpu/possible
0
# cat /boot/config-4.4.0-66-generic | grep CONFIG_PREEMPT
# CONFIG_PREEMPT is not set

In this case, we should only have maxactive == 1:

        /* Pre-allocate memory for max kretprobe instances */
        if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
                rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());
#else
                rp->maxactive = num_possible_cpus();
#endif
        }

At a first glance, tcp_v4_connect() is not sleeping (no schedule() or semaphore in this function..) so I don't understand how we can have several calls on an non-preemptive kernel before the function returning.

I don't see know if it is possible to configure a kprobe with a higher maxactive. @iaguis is looking into that.

@alban
Copy link
Contributor Author

alban commented Mar 23, 2017

This explains why I cannot reproduce the issue on my laptop: I have 8 "possible" cpus on my laptop (cat /sys/devices/system/cpu/possible) so it's much more difficult to have so many tcp_v4_connect at the same time.

@iaguis
Copy link
Contributor

iaguis commented Mar 23, 2017

I don't see know if it is possible to configure a kprobe with a higher maxactive. @iaguis is looking into that.

From what I see in the kernel, there's no way to configure maxactive unless you set it explicitly in struct kretprobe.

This is the output of searching for maxactive in the kernel tree:

iaguis@locke-xps13: ~/git/linux [(v4.10)] 
> rg maxactive
Documentation/kprobes.txt
52:a post_handler, and how to use the maxactive and nmissed fields of
135:register_kretprobe(), the user sets the maxactive field of the
141:spinlock held, maxactive = 1 should be enough.  If the function is
143:or preemption), NR_CPUS should be enough.  If maxactive <= 0, it is
147:It's not a disaster if you set maxactive too low; you'll just miss
433:You must set rp->maxactive appropriately before you call

kernel/kprobes.c
1861:	if (rp->maxactive <= 0) {
1863:		rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());
1865:		rp->maxactive = num_possible_cpus();
1870:	for (i = 0; i < rp->maxactive; i++) {

include/linux/kprobes.h
177: * User needs to provide a handler function, and initialize maxactive.
178: * maxactive - The maximum number of instances of the probed function that
181: * ignored, due to maxactive being too low.
188:	int maxactive;

samples/kprobes/kretprobe_example.c
75:	.maxactive		= 20,
98:	/* nmissed > 0 suggests that maxactive was set too low. */

I only see the place where it's set (see previous comment), an example of kernel module where it's set directly in struct kretprobe, documentation, and the definition of struct kretprobe.

@iaguis
Copy link
Contributor

iaguis commented Mar 23, 2017

We just asked about this in iovisor-dev: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000694.html

alban added a commit to kinvolk-archives/tcptracer-bpf that referenced this issue Mar 25, 2017
If the kretprobe for tcp_v4_connect() is configured with a too-low
maxactive, some kretprobe might be missing. In this case, we detect it
and try again.

This is more likely to happen on a single-core VM with a non-preemptive
kernel (CONFIG_PREEMPT not set) because maxactive would be 1 in that
case.

See weaveworks#24

Based on work from @iaguis.
iaguis added a commit to kinvolk-archives/scope that referenced this issue Mar 29, 2017
We found out we were losing kretprobes sometimes because maxactive was set too
low[1][1]. This problem was more apparent in our GCE test environment because
the kernel was configured with `CONFIG_PREEMPT` not set and we're running
single-core VMs (see weaveworks/tcptracer-bpf#24 for
more details).

Unfortunately, we can't set the maxactive explicitly from userspace. Alban
submitted a kernel patch to allow this[2][2].

This bumps tcptracer-bpf to include a workaround[3][3] for this issue in the
guess-offsets phase.

[1]: weaveworks/tcptracer-bpf#24
[2]: https://lkml.org/lkml/2017/3/28/629
[3]: weaveworks/tcptracer-bpf#33
iaguis added a commit to kinvolk-archives/scope that referenced this issue Mar 29, 2017
We found out we were losing kretprobes sometimes because maxactive was set too
low[1]. This problem was more apparent in our GCE test environment because
the kernel was configured with `CONFIG_PREEMPT` not set and we're running
single-core VMs (see weaveworks/tcptracer-bpf#24 for
more details).

Unfortunately, we can't set the maxactive explicitly from userspace. Alban
submitted a kernel patch to allow this[2].

This bumps tcptracer-bpf to include a workaround[3] for this issue in the
guess-offsets phase.

[1]: weaveworks/tcptracer-bpf#24
[2]: https://lkml.org/lkml/2017/3/28/629
[3]: weaveworks/tcptracer-bpf#33
sunhay added a commit to DataDog/tcptracer-bpf that referenced this issue Feb 21, 2019
Add UDP tests as well as fix bug calculating bytes received
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants