Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execsnoop can miss some events in parallel environments #1250

Open
anisse opened this issue Jul 10, 2017 · 11 comments
Open

Execsnoop can miss some events in parallel environments #1250

anisse opened this issue Jul 10, 2017 · 11 comments

Comments

@anisse
Copy link

anisse commented Jul 10, 2017

It turns out even execsnoop can miss some short-lived processes, when those run in a highly-parallel environment. I've ran into this while building a tree version of execsnoop (for flamegraphs). I have a build system that has quite a few short-lived processes, and sometimes I'd miss a few process, completely breaking the tree chain (which I've rewritten to get the ppid in-kernel).

I've built a simple program to illustrate the issue. It will run a given number of execve, over a given number of processors (it does this by forking).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>


#define min(a, b) ( (a) < (b) ? (a) : (b))
#define max(a, b) ( (a) > (b) ? (a) : (b))

void execbomb(int children, int start, int end, const char *filename) {
        int span = end - start;
        int per_child = 0, cstart, cend;
        char s_start[32], s_end[32];
        int parent = getpid();
        if (span <= 0)
                return;

        //printf("PID %d Got c=%d s=%d e=%d %s\n", parent, children, start, end, filename);
        if (children > 0) {
                // split the range
                children = min(span, children);
                per_child = max(span/children, 1);
                cstart = start;

                while (children > 0 && cstart <= end) {
                        cend = cstart + per_child;
                        if(!fork()) {
                                // child only works on its slice
                                start = cstart;
                                end = cend;
                                break; //stop forking
                        }
                        children--;
                        cstart += per_child;
                }
                if (parent == getpid()) {
                        if(cend < end)
                                start = cend; // parent gets to finish the job no child had
                        else
                                return;
                }
        }
        sprintf(s_start, "%d", start + 1);
        sprintf(s_end, "%d", end);

        // Fork to generate a new pid every time
        if (!fork())
                execl(filename, filename, s_start, s_end, NULL);
}

int main(int argc, char *argv[]) {
        int children = 0;
        if(argc < 3 || argc > 4) {
                printf("Usage: %s <start> <end> [forks]\n", argv[0]);
                return -1;
        }

        if (argc == 4)
                children = atoi(argv[3]);

        execbomb(children, atoi(argv[1]), atoi(argv[2]), argv[0]);

        return 0;
}

Say you want to run this in a single CPU, and generate 4000 execve():
./execbomb 0 4000 0

In this case, it seems execsnoop.py -n execbomb catches all event. Now, run 17 parallel processes at a given time:
./execbomb 0 4000 17

And then I'm losing about ~274 out of 4000 events. This might need tuning on your hardware, and you can verify easily that the program works properly, like this:

$ strace -e trace=execve -ff ./execbomb 0 4000 17 |& grep 'execve('  |wc -l
4001

(first execve() is the program itself).

Running it through strace does slow it enough so that execsnoop catches all events.

Important point: in these examples, I'm never seeing the perf message "Possibly lost X samples". You could use this program to trigger it though. Also, I always wait for execsnoop.py to finish its processing (through casual top watching). I have disabled the get_ppid() userland search, as well as tried using a bigger perf buffer (page_cnt=512).

If you analyse the log, you might also see events where the arg submission logic fails, with either an empty command line, or multiple command lines concatenated.

@anisse
Copy link
Author

anisse commented Jul 10, 2017

I've built a much simpler version of the program, but now you have to do some maths:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>


void execbomb(int count, const char *filename) {
        char s_count[12];

        if (count <= 0)
                return;

        snprintf(s_count, sizeof(s_count) - 1, "%d", count -1);

        execl(filename, filename, s_count, "-1", NULL);
}

int main(int argc, char *argv[]) {
        int children;
        if(argc != 3 ) {
                printf("Usage: %s <count> <forks>\n", argv[0]);
                return -1;
        }
        children = atoi(argv[2]);
        while(children-- > 0) {
                if(!fork()) {
                        children = -1;
                        break;
                }
        }
        if(children < 0)
                execbomb(atoi(argv[1]), argv[0]);

        wait(NULL);

        return 0;
}

Use it like this: ./execbomb X Y, it will run Y processes, with X execve per child.

@yonghong-song
Copy link
Collaborator

The execsnoop basically kprobes on sys_execve.
It made two records on entry point and exit return of the function. It considers a successful
of sys_execve only both entry and exit points are hit. This does make sense. In additional, the
tool is able to print out the overhead of the syscall.

I did a quick experiment on this. I added an array of two elements and one element is incremented at the entry point and another is incremented at the exit point. I found that the entry point is exactly 4001 (one is the program itself), but the exit point is around 3500. 500 is lost.

So the issue is not the bpf or perf output missing samples, but the kretprobe sometimes not triggered. Not sure what exactly happening here.

@yonghong-song
Copy link
Collaborator

yonghong-song commented Jul 10, 2017

This appears to be a known issue.
weaveworks/tcptracer-bpf#24
It has happened to tcp_v4_connect kretprobe as well.

There are no simple solution as the controlling kernel variable/field is not configurable.

        /* Pre-allocate memory for max kretprobe instances */
        if (rp->maxactive <= 0) {
#ifdef CONFIG_PREEMPT
                rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());
#else
                rp->maxactive = num_possible_cpus();
#endif
        }

@yonghong-song
Copy link
Collaborator

Actually, maxactive is tunable.
Looking at the kprobe documentation (linux:Documentation/trace/kprobetrace.txt):
...
r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
...

basically, in bcc, we can generate event like "r50:...". This will set maxactive to 50
inside the kernel.

I tried this example, twice of online cpu number is not enough, still some samples
are missing from kretprobe. Randomly tried 50 (online cpu number is 4), no sample
missing.

We could package something like kretprobe_maxactive=# in a parameter like
tunable_options into BPF constructor. This can serve use cases where special
workaround/tuning is needed.

@anisse
Copy link
Author

anisse commented Jul 11, 2017

Apparently the maxactive tunable feature is new in 4.12, so this would need to be documented as well.

@iaguis
Copy link
Contributor

iaguis commented Jul 11, 2017

This was also discussed in #1072 and we added support for maxactive in gobpf in iovisor/gobpf#39.

@goldshtn
Copy link
Collaborator

I suggest we close this issue, and open another one for adding the MAXACTIVE feature to the docs, and another one for adding MAXACTIVE support to the bcc API.

@pchaigno
Copy link
Contributor

#2224 fixes this.

@sanmai-NL
Copy link

@anisse Is this solved?

@anisse
Copy link
Author

anisse commented Apr 1, 2023

It still happens with the python version of execsnoop (even when modifying to raise maxactive to its maximum(4096) in the kretprobe) , but I can't reproduce with the C version in libbpf-tools. I think we can close this and advise people to use the libbpf-tools C version instead.

@raph431
Copy link

raph431 commented Sep 7, 2024

Is it possible, that I ran into this issue here?
Since this merge I should be able to increase maxactive.
How can I do that in execsnoop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants