Execsnoop can miss some events in parallel environments #1250

anisse · 2017-07-10T11:36:21Z

It turns out even execsnoop can miss some short-lived processes, when those run in a highly-parallel environment. I've ran into this while building a tree version of execsnoop (for flamegraphs). I have a build system that has quite a few short-lived processes, and sometimes I'd miss a few process, completely breaking the tree chain (which I've rewritten to get the ppid in-kernel).

I've built a simple program to illustrate the issue. It will run a given number of execve, over a given number of processors (it does this by forking).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>


#define min(a, b) ( (a) < (b) ? (a) : (b))
#define max(a, b) ( (a) > (b) ? (a) : (b))

void execbomb(int children, int start, int end, const char *filename) {
        int span = end - start;
        int per_child = 0, cstart, cend;
        char s_start[32], s_end[32];
        int parent = getpid();
        if (span <= 0)
                return;

        //printf("PID %d Got c=%d s=%d e=%d %s\n", parent, children, start, end, filename);
        if (children > 0) {
                // split the range
                children = min(span, children);
                per_child = max(span/children, 1);
                cstart = start;

                while (children > 0 && cstart <= end) {
                        cend = cstart + per_child;
                        if(!fork()) {
                                // child only works on its slice
                                start = cstart;
                                end = cend;
                                break; //stop forking
                        }
                        children--;
                        cstart += per_child;
                }
                if (parent == getpid()) {
                        if(cend < end)
                                start = cend; // parent gets to finish the job no child had
                        else
                                return;
                }
        }
        sprintf(s_start, "%d", start + 1);
        sprintf(s_end, "%d", end);

        // Fork to generate a new pid every time
        if (!fork())
                execl(filename, filename, s_start, s_end, NULL);
}

int main(int argc, char *argv[]) {
        int children = 0;
        if(argc < 3 || argc > 4) {
                printf("Usage: %s <start> <end> [forks]\n", argv[0]);
                return -1;
        }

        if (argc == 4)
                children = atoi(argv[3]);

        execbomb(children, atoi(argv[1]), atoi(argv[2]), argv[0]);

        return 0;
}

Say you want to run this in a single CPU, and generate 4000 execve():
./execbomb 0 4000 0

In this case, it seems execsnoop.py -n execbomb catches all event. Now, run 17 parallel processes at a given time:
./execbomb 0 4000 17

And then I'm losing about ~274 out of 4000 events. This might need tuning on your hardware, and you can verify easily that the program works properly, like this:

$ strace -e trace=execve -ff ./execbomb 0 4000 17 |& grep 'execve('  |wc -l
4001

(first execve() is the program itself).

Running it through strace does slow it enough so that execsnoop catches all events.

Important point: in these examples, I'm never seeing the perf message "Possibly lost X samples". You could use this program to trigger it though. Also, I always wait for execsnoop.py to finish its processing (through casual top watching). I have disabled the get_ppid() userland search, as well as tried using a bigger perf buffer (page_cnt=512).

If you analyse the log, you might also see events where the arg submission logic fails, with either an empty command line, or multiple command lines concatenated.

The text was updated successfully, but these errors were encountered:

anisse · 2017-07-10T12:45:35Z

I've built a much simpler version of the program, but now you have to do some maths:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>


void execbomb(int count, const char *filename) {
        char s_count[12];

        if (count <= 0)
                return;

        snprintf(s_count, sizeof(s_count) - 1, "%d", count -1);

        execl(filename, filename, s_count, "-1", NULL);
}

int main(int argc, char *argv[]) {
        int children;
        if(argc != 3 ) {
                printf("Usage: %s <count> <forks>\n", argv[0]);
                return -1;
        }
        children = atoi(argv[2]);
        while(children-- > 0) {
                if(!fork()) {
                        children = -1;
                        break;
                }
        }
        if(children < 0)
                execbomb(atoi(argv[1]), argv[0]);

        wait(NULL);

        return 0;
}

Use it like this: ./execbomb X Y, it will run Y processes, with X execve per child.

yonghong-song · 2017-07-10T23:14:06Z

The execsnoop basically kprobes on sys_execve.
It made two records on entry point and exit return of the function. It considers a successful
of sys_execve only both entry and exit points are hit. This does make sense. In additional, the
tool is able to print out the overhead of the syscall.

I did a quick experiment on this. I added an array of two elements and one element is incremented at the entry point and another is incremented at the exit point. I found that the entry point is exactly 4001 (one is the program itself), but the exit point is around 3500. 500 is lost.

So the issue is not the bpf or perf output missing samples, but the kretprobe sometimes not triggered. Not sure what exactly happening here.

yonghong-song · 2017-07-10T23:48:07Z

This appears to be a known issue.
weaveworks/tcptracer-bpf#24
It has happened to tcp_v4_connect kretprobe as well.

There are no simple solution as the controlling kernel variable/field is not configurable.

        /* Pre-allocate memory for max kretprobe instances */
        if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
                rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());
#else
                rp->maxactive = num_possible_cpus();
#endif
        }

yonghong-song · 2017-07-11T05:32:21Z

Actually, maxactive is tunable.
Looking at the kprobe documentation (linux:Documentation/trace/kprobetrace.txt):
...
r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
...

basically, in bcc, we can generate event like "r50:...". This will set maxactive to 50
inside the kernel.

I tried this example, twice of online cpu number is not enough, still some samples
are missing from kretprobe. Randomly tried 50 (online cpu number is 4), no sample
missing.

We could package something like kretprobe_maxactive=# in a parameter like
tunable_options into BPF constructor. This can serve use cases where special
workaround/tuning is needed.

anisse · 2017-07-11T07:06:19Z

Apparently the maxactive tunable feature is new in 4.12, so this would need to be documented as well.

iaguis · 2017-07-11T09:23:00Z

This was also discussed in #1072 and we added support for maxactive in gobpf in iovisor/gobpf#39.

goldshtn · 2017-07-11T18:16:50Z

I suggest we close this issue, and open another one for adding the MAXACTIVE feature to the docs, and another one for adding MAXACTIVE support to the bcc API.

pchaigno · 2019-02-21T18:12:52Z

#2224 fixes this.

sanmai-NL · 2023-03-29T14:40:49Z

@anisse Is this solved?

anisse · 2023-04-01T12:10:27Z

It still happens with the python version of execsnoop (even when modifying to raise maxactive to its maximum(4096) in the kretprobe) , but I can't reproduce with the C version in libbpf-tools. I think we can close this and advise people to use the libbpf-tools C version instead.

raph431 · 2024-09-07T09:00:20Z

Is it possible, that I ran into this issue here?
Since this merge I should be able to increase maxactive.
How can I do that in execsnoop?

pchaigno mentioned this issue Feb 21, 2019

Expose kprobe's maxactive parameter in bcc #2224

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execsnoop can miss some events in parallel environments #1250

Execsnoop can miss some events in parallel environments #1250

anisse commented Jul 10, 2017

anisse commented Jul 10, 2017

yonghong-song commented Jul 10, 2017

yonghong-song commented Jul 10, 2017 •

edited

Loading

yonghong-song commented Jul 11, 2017

anisse commented Jul 11, 2017

iaguis commented Jul 11, 2017

goldshtn commented Jul 11, 2017

pchaigno commented Feb 21, 2019

sanmai-NL commented Mar 29, 2023

anisse commented Apr 1, 2023

raph431 commented Sep 7, 2024

Execsnoop can miss some events in parallel environments #1250

Execsnoop can miss some events in parallel environments #1250

Comments

anisse commented Jul 10, 2017

anisse commented Jul 10, 2017

yonghong-song commented Jul 10, 2017

yonghong-song commented Jul 10, 2017 • edited Loading

yonghong-song commented Jul 11, 2017

anisse commented Jul 11, 2017

iaguis commented Jul 11, 2017

goldshtn commented Jul 11, 2017

pchaigno commented Feb 21, 2019

sanmai-NL commented Mar 29, 2023

anisse commented Apr 1, 2023

raph431 commented Sep 7, 2024

yonghong-song commented Jul 10, 2017 •

edited

Loading