-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pixie Misses Events from Short Lived Processes/Pods #1638
Comments
Please view this pixie branch that has a mechanism to deploy a bug triggering test. |
vihangm
pushed a commit
that referenced
this issue
Oct 3, 2023
Summary: Added Capturing of Cgroup ids to the ebpf probes in the socket tracer. This includes modifications to data structures for communicating that information to userspace and other data tables. It also includes some minimal testing for ensuring that cgroup ids are captured correctly. The reason for this change is given in the design document for the solution to this [issue](#1638): in short it is the first step in using cgids rather than pids to associate a particular piece of communication data with the correct pod. Test Plan: Added Cgroup id capture checks to existing `socket_tracer` tests. Type of change: /kind Feature expansion Signed-off-by: AdityaAtulTewari <adityaatewari@gmail.com>
aimichelle
pushed a commit
that referenced
this issue
Oct 17, 2023
Summary: Revert cgid capture This change reverts capturing Cgroup ids from 2a2349f. Many of the 4.14 kernel tests have [BPF compilation errors](https://app.buildbuddy.io/invocation/075de8d0-d161-4ff6-8b74-123c6da66778#): `HINT: bpf_get_current_cgroup_id missing (added in Linux 4.18)`. Perplexingly, GitHub seems to show these tests as [passing despite the failures](https://github.com/pixie-io/pixie/actions/runs/6366478258/job/17320729511). Relevant Issues: #1638 #1739 Type of change: /kind bug Test Plan: Running CI Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
4 tasks
ddelnano
added a commit
that referenced
this issue
Sep 5, 2024
Summary: Populate client side trace's local address via tcp kprobes This change populates client side trace's `local_addr` and `local_port` columns for the following use cases: 1. To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful 2. To facilitate addressing a portion of the short lived process problems (#1638) For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on the [px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/) function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the `local_addr` column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the `df.ctx["pod"]` lookup: `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`. I intend to follow this PR with a compiler change that will make `df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes. **Alternatives** Another approach I considered was expanding our use of the `sock_alloc` kprobe. I used ftrace on a simple curl command to see what other options could be used (`sudo trace-cmd record -F -p function_graph http://google.com`). The `socket` syscall calls `sock_alloc`, which would be another mechanism for accessing the `struct sock`. I decided against this approach because I don't think its viable to assume that the same thread/process that calls `socket` will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call `socket` than the ones that later read/write to it. **Probe stability** These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the [tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422), [tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941) structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface. Relevant Issues: #1829, #1638 Type of change: /kind feature Test Plan: New tests verify that ipv4 and ipv6 cases work - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that `local_addr` is populated with this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))` works for pod name resolution. - [x] Verified the above curl test results in traces without `local_addr` without this change  - Tested on the following k8s offerings and machine images - [x] GKE COS and Ubuntu - [x] EKS Amazon Linux 2 Changelog Message: Populate socket tracer data table `local_addr` and `local_port` column for client side traces. --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano
added a commit
to ddelnano/pixie
that referenced
this issue
Sep 23, 2024
…1989) Summary: Populate client side trace's local address via tcp kprobes This change populates client side trace's `local_addr` and `local_port` columns for the following use cases: 1. To provide more consistency for the protocol data tables. Having columns that are empty make it difficult for end users to understand what is being traced and make them less useful 2. To facilitate addressing a portion of the short lived process problems (pixie-io#1638) For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on the [px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/) function. If a PEM misses the short lived process during its metadata update, this function fails to resolve the pod name. For client side traces where the pod is making an outbound connection (non localhost), the `local_addr` column provides an alternative pod name lookup for short lived processes when the pod is long lived. This means the following would be equivalent to the `df.ctx["pod"]` lookup: `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`. I intend to follow this PR with a compiler change that will make `df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to resolve. This will allow the existing pxl scripts to display the previously missed short lived processes. **Alternatives** Another approach I considered was expanding our use of the `sock_alloc` kprobe. I used ftrace on a simple curl command to see what other options could be used (`sudo trace-cmd record -F -p function_graph http://google.com`). The `socket` syscall calls `sock_alloc`, which would be another mechanism for accessing the `struct sock`. I decided against this approach because I don't think its viable to assume that the same thread/process that calls `socket` will be the one that does the later syscalls (how our BPF maps are set up). It's common to have a forking web server model, which means a different process/thread can call `socket` than the ones that later read/write to it. **Probe stability** These probes appear to be stable from our oldest and newest supported kernel. These functions exist in the [tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422), [tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941) structs and I've seen that other projects and bcc tools use these probes. This makes me believe that these functions have a pretty well defined interface. Relevant Issues: pixie-io#1829, pixie-io#1638 Type of change: /kind feature Test Plan: New tests verify that ipv4 and ipv6 cases work - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that `local_addr` is populated with this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))` works for pod name resolution. - [x] Verified the above curl test results in traces without `local_addr` without this change  - Tested on the following k8s offerings and machine images - [x] GKE COS and Ubuntu - [x] EKS Amazon Linux 2 Changelog Message: Populate socket tracer data table `local_addr` and `local_port` column for client side traces. --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com> GitOrigin-RevId: 0c1fdd2
This was referenced Jan 16, 2025
ddelnano
added a commit
that referenced
this issue
Jan 24, 2025
) Summary: Make metadata pod lookups more resilient to short lived processes This is a continuation of the work started from #1989. Since the `local_addr` column is populated for client side traces, it can be used as a fallback lookup for these traces. This doesn't solve all of the permutations of missing short lived processes (#1638), but provides more coverage than before. Relevant Issues: #1638 Type of change: /kind bugfix Test Plan: Verified the following - [x] Compared the performance with and without this change with `src/e2e_test/vizier/exectime:exectime`. This change has a minor performance impact, but it closes the gap on certain situations that previously caused users to distrust Pixie's instrumentation ``` # Performance baseline $ ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee baseline_for_simple_udf_swap_e20880ffd.txt # Performance of this change ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee simple_udf_swap_cd217c05c.txt ``` [simple_udf_swap_cd217c05c.txt](https://github.com/user-attachments/files/18497709/simple_udf_swap_cd217c05c.txt) [baseline_for_simple_udf_swap_e20880ffd.txt](https://github.com/user-attachments/files/18497710/baseline_for_simple_udf_swap_e20880ffd.txt) - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that with this change all traces are shown, without this change a significant number of traces are missed. See before and after screenshots below:   Changelog Message: Fix a certain class of cases where Pixie previously missed protocol traces from short lived connections --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
In many cases, Pixie fails to report HTTP events for short-lived processes, including for short-lived Kubernetes Pods and for short-lived child-processes within longer-living Pods. This is because Pixie's data collector Stirling fails to collect the necessary process metadata. This metadata is used to reconcile communication traces with their source/destination pod(s).
Currently, running processes have their process id (pid) communicated to Stirling by each data capture ebpf probe. In order to properly associate this pid with a kubernetes pod, a lookup procedure in the Stirling's userspace periodically probes Linux's sysfs in order to collect all pids in each cgroup, along with that cgroup's metadata. The procedure to associate a pid with a kubernetes pod does a lookup using the pid to find the cgroup containing the process. The cgroup path name is related to the pod unique id (pod uid) and they are able to match them using regex (code here). However pids of short-lived processes will leave the cgroup once the process terminates. Thus if the process terminates before a lookup occurs, then the data for that pid cannot be exported from stirling.
To Reproduce
Steps to reproduce the behavior:
curl example.com
Expected behavior
The Behavior of Short Running processes should be captured. There are several ways to achieve this but this is the desired high level behavior.
App information (please complete the following information):
This is an issue discussed with the Pixie team, and impacts pixie in all its current versions.
Additional Information
This is a known issue to the Pixie Team, and I am working with them to fix it.
Here is a link to the evolving design document.
The text was updated successfully, but these errors were encountered: