Pixie Misses Events from Short Lived Processes/Pods #1638

tewaro · 2023-07-26T22:07:08Z

Describe the bug
In many cases, Pixie fails to report HTTP events for short-lived processes, including for short-lived Kubernetes Pods and for short-lived child-processes within longer-living Pods. This is because Pixie's data collector Stirling fails to collect the necessary process metadata. This metadata is used to reconcile communication traces with their source/destination pod(s).

Currently, running processes have their process id (pid) communicated to Stirling by each data capture ebpf probe. In order to properly associate this pid with a kubernetes pod, a lookup procedure in the Stirling's userspace periodically probes Linux's sysfs in order to collect all pids in each cgroup, along with that cgroup's metadata. The procedure to associate a pid with a kubernetes pod does a lookup using the pid to find the cgroup containing the process. The cgroup path name is related to the pod unique id (pod uid) and they are able to match them using regex (code here). However pids of short-lived processes will leave the cgroup once the process terminates. Thus if the process terminates before a lookup occurs, then the data for that pid cannot be exported from stirling.

To Reproduce
Steps to reproduce the behavior:

Deploy pixie
Start a container and ssh into it
Run curl example.com
Check Pixie Dashboard and you'll see that the behavior was not captured in the http_data script.

Expected behavior
The Behavior of Short Running processes should be captured. There are several ways to achieve this but this is the desired high level behavior.

App information (please complete the following information):
This is an issue discussed with the Pixie team, and impacts pixie in all its current versions.

Additional Information
This is a known issue to the Pixie Team, and I am working with them to fix it.
Here is a link to the evolving design document.

The text was updated successfully, but these errors were encountered:

tewaro · 2023-08-01T22:36:28Z

Please view this pixie branch that has a mechanism to deploy a bug triggering test.

Summary: Added Capturing of Cgroup ids to the ebpf probes in the socket tracer. This includes modifications to data structures for communicating that information to userspace and other data tables. It also includes some minimal testing for ensuring that cgroup ids are captured correctly. The reason for this change is given in the design document for the solution to this [issue](#1638): in short it is the first step in using cgids rather than pids to associate a particular piece of communication data with the correct pod. Test Plan: Added Cgroup id capture checks to existing `socket_tracer` tests. Type of change: /kind Feature expansion Signed-off-by: AdityaAtulTewari <adityaatewari@gmail.com>

Summary: Revert cgid capture This change reverts capturing Cgroup ids from 2a2349f. Many of the 4.14 kernel tests have [BPF compilation errors](https://app.buildbuddy.io/invocation/075de8d0-d161-4ff6-8b74-123c6da66778#): `HINT: bpf_get_current_cgroup_id missing (added in Linux 4.18)`. Perplexingly, GitHub seems to show these tests as [passing despite the failures](https://github.com/pixie-io/pixie/actions/runs/6366478258/job/17320729511). Relevant Issues: #1638 #1739 Type of change: /kind bug Test Plan: Running CI Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>

Summary: Populate client side trace's local address via tcp kprobes This change populates client side trace's `local_addr` and `local_port` columns for the following use cases: 1. To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful 2. To facilitate addressing a portion of the short lived process problems (#1638) For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on the [px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/) function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the `local_addr` column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the `df.ctx["pod"]` lookup: `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`. I intend to follow this PR with a compiler change that will make `df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes. **Alternatives** Another approach I considered was expanding our use of the `sock_alloc` kprobe. I used ftrace on a simple curl command to see what other options could be used (`sudo trace-cmd record -F -p function_graph http://google.com`). The `socket` syscall calls `sock_alloc`, which would be another mechanism for accessing the `struct sock`. I decided against this approach because I don't think its viable to assume that the same thread/process that calls `socket` will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call `socket` than the ones that later read/write to it. **Probe stability** These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the [tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422), [tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941) structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface. Relevant Issues: #1829, #1638 Type of change: /kind feature Test Plan: New tests verify that ipv4 and ipv6 cases work - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that `local_addr` is populated with this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))` works for pod name resolution. - [x] Verified the above curl test results in traces without `local_addr` without this change ![local-addr-testing](https://github.com/user-attachments/assets/344be022-97a0-4096-8af7-8de20d741e40) - Tested on the following k8s offerings and machine images - [x] GKE COS and Ubuntu - [x] EKS Amazon Linux 2 Changelog Message: Populate socket tracer data table `local_addr` and `local_port` column for client side traces. --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

…1989) Summary: Populate client side trace's local address via tcp kprobes This change populates client side trace's `local_addr` and `local_port` columns for the following use cases: 1. To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful 2. To facilitate addressing a portion of the short lived process problems (pixie-io#1638) For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on the [px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/) function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the `local_addr` column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the `df.ctx["pod"]` lookup: `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`. I intend to follow this PR with a compiler change that will make `df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes. **Alternatives** Another approach I considered was expanding our use of the `sock_alloc` kprobe. I used ftrace on a simple curl command to see what other options could be used (`sudo trace-cmd record -F -p function_graph http://google.com`). The `socket` syscall calls `sock_alloc`, which would be another mechanism for accessing the `struct sock`. I decided against this approach because I don't think its viable to assume that the same thread/process that calls `socket` will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call `socket` than the ones that later read/write to it. **Probe stability** These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the [tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422), [tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941) structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface. Relevant Issues: pixie-io#1829, pixie-io#1638 Type of change: /kind feature Test Plan: New tests verify that ipv4 and ipv6 cases work - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that `local_addr` is populated with this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))` works for pod name resolution. - [x] Verified the above curl test results in traces without `local_addr` without this change ![local-addr-testing](https://github.com/user-attachments/assets/344be022-97a0-4096-8af7-8de20d741e40) - Tested on the following k8s offerings and machine images - [x] GKE COS and Ubuntu - [x] EKS Amazon Linux 2 Changelog Message: Populate socket tracer data table `local_addr` and `local_port` column for client side traces. --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com> GitOrigin-RevId: 0c1fdd2

) Summary: Make metadata pod lookups more resilient to short lived processes This is a continuation of the work started from #1989. Since the `local_addr` column is populated for client side traces, it can be used as a fallback lookup for these traces. This doesn't solve all of the permutations of missing short lived processes (#1638), but provides more coverage than before. Relevant Issues: #1638 Type of change: /kind bugfix Test Plan: Verified the following - [x] Compared the performance with and without this change with `src/e2e_test/vizier/exectime:exectime`. This change has a minor performance impact, but it closes the gap on certain situations that previously caused users to distrust Pixie's instrumentation ``` # Performance baseline $ ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee baseline_for_simple_udf_swap_e20880ffd.txt # Performance of this change ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee simple_udf_swap_cd217c05c.txt ``` [simple_udf_swap_cd217c05c.txt](https://github.com/user-attachments/files/18497709/simple_udf_swap_cd217c05c.txt) [baseline_for_simple_udf_swap_e20880ffd.txt](https://github.com/user-attachments/files/18497710/baseline_for_simple_udf_swap_e20880ffd.txt) - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that with this change all traces are shown, without this change a significant number of traces are missed. See before and after screenshots below: ![vizier-0 14 14-curl-with-missing-data](https://github.com/user-attachments/assets/035b5dcf-d87a-4134-84c1-9e478594927b) ![traces-with-new-fallback](https://github.com/user-attachments/assets/2a84ecbb-83cb-45ae-af85-77b1773efb59) Changelog Message: Fix a certain class of cases where Pixie previously missed protocol traces from short lived connections --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com>

tewaro changed the title ~~Capturing Short Running Processes~~ Pixie Misses Events from Short Lived Processes/Pods Jul 26, 2023

tewaro mentioned this issue Aug 18, 2023

Added cgid capture to stirling with testing #1681

Merged

benkilimnik mentioned this issue Oct 16, 2023

Revert "Added cgid capture to stirling with testing (#1681)" #1738

Merged

tewaro mentioned this issue Oct 20, 2023

Add cgroup bcc flag for compilation #1748

Open

evanrolfe mentioned this issue Aug 4, 2024

Add multi-architecture build, add layers to dockerfile evanrolfe/trayce_agent#7

Closed

ddelnano mentioned this issue Aug 23, 2024

Populate client side trace's local address via tcp kprobes #1989

Merged

4 tasks

This was referenced Jan 16, 2025

Make metadata pod lookups more resilient to short lived processes #2078

Closed

Make metadata pod lookups more resilient to short lived processes #2094

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pixie Misses Events from Short Lived Processes/Pods #1638

Pixie Misses Events from Short Lived Processes/Pods #1638

tewaro commented Jul 26, 2023

tewaro commented Aug 1, 2023

Pixie Misses Events from Short Lived Processes/Pods #1638

Pixie Misses Events from Short Lived Processes/Pods #1638

Comments

tewaro commented Jul 26, 2023

tewaro commented Aug 1, 2023