Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixie Misses Events from Short Lived Processes/Pods #1638

Open
tewaro opened this issue Jul 26, 2023 · 1 comment
Open

Pixie Misses Events from Short Lived Processes/Pods #1638

tewaro opened this issue Jul 26, 2023 · 1 comment

Comments

@tewaro
Copy link
Contributor

tewaro commented Jul 26, 2023

Describe the bug
In many cases, Pixie fails to report HTTP events for short-lived processes, including for short-lived Kubernetes Pods and for short-lived child-processes within longer-living Pods. This is because Pixie's data collector Stirling fails to collect the necessary process metadata. This metadata is used to reconcile communication traces with their source/destination pod(s).

Currently, running processes have their process id (pid) communicated to Stirling by each data capture ebpf probe. In order to properly associate this pid with a kubernetes pod, a lookup procedure in the Stirling's userspace periodically probes Linux's sysfs in order to collect all pids in each cgroup, along with that cgroup's metadata. The procedure to associate a pid with a kubernetes pod does a lookup using the pid to find the cgroup containing the process. The cgroup path name is related to the pod unique id (pod uid) and they are able to match them using regex (code here). However pids of short-lived processes will leave the cgroup once the process terminates. Thus if the process terminates before a lookup occurs, then the data for that pid cannot be exported from stirling.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy pixie
  2. Start a container and ssh into it
  3. Run curl example.com
  4. Check Pixie Dashboard and you'll see that the behavior was not captured in the http_data script.

Expected behavior
The Behavior of Short Running processes should be captured. There are several ways to achieve this but this is the desired high level behavior.

App information (please complete the following information):
This is an issue discussed with the Pixie team, and impacts pixie in all its current versions.

Additional Information
This is a known issue to the Pixie Team, and I am working with them to fix it.
Here is a link to the evolving design document.

@tewaro tewaro changed the title Capturing Short Running Processes Pixie Misses Events from Short Lived Processes/Pods Jul 26, 2023
@tewaro
Copy link
Contributor Author

tewaro commented Aug 1, 2023

Please view this pixie branch that has a mechanism to deploy a bug triggering test.

vihangm pushed a commit that referenced this issue Oct 3, 2023
Summary: Added Capturing of Cgroup ids to the ebpf probes in the socket
tracer. This includes modifications to data structures for communicating
that information to userspace and other data tables. It also includes
some minimal testing for ensuring that cgroup ids are captured
correctly. The reason for this change is given in the design document
for the solution to this
[issue](#1638): in short it is
the first step in using cgids rather than pids to associate a particular
piece of communication data with the correct pod.

Test Plan: Added Cgroup id capture checks to existing `socket_tracer`
tests.

Type of change: /kind Feature expansion

Signed-off-by: AdityaAtulTewari <adityaatewari@gmail.com>
aimichelle pushed a commit that referenced this issue Oct 17, 2023
Summary: Revert cgid capture

This change reverts capturing Cgroup ids from
2a2349f.
Many of the 4.14 kernel tests have [BPF compilation
errors](https://app.buildbuddy.io/invocation/075de8d0-d161-4ff6-8b74-123c6da66778#):
`HINT: bpf_get_current_cgroup_id missing (added in Linux 4.18)`.
Perplexingly, GitHub seems to show these tests as [passing despite the
failures](https://github.com/pixie-io/pixie/actions/runs/6366478258/job/17320729511).

Relevant Issues: #1638 #1739

Type of change: /kind bug

Test Plan: Running CI

Signed-off-by: Benjamin Kilimnik <bkilimnik@pixielabs.ai>
ddelnano added a commit that referenced this issue Sep 5, 2024
Summary: Populate client side trace's local address via tcp kprobes

This change populates client side trace's `local_addr` and `local_port`
columns for the following use cases:
1. To provide more consistency for the protocol data tables. Having
columns that are empty make it difficult for end users to understand
what is being traced and make them less useful
2. To facilitate addressing a portion of the short lived process
problems (#1638)

For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on
the
[px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/)
function. If a PEM misses the short lived process during its metadata
update, this function fails to resolve the pod name. For client side
traces where the pod is making an outbound connection (non localhost),
the `local_addr` column provides an alternative pod name lookup for
short lived processes when the pod is long lived. This means the
following would be equivalent to the `df.ctx["pod"]` lookup:
`px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`.

I intend to follow this PR with a compiler change that will make
`df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to
resolve. This will allow the existing pxl scripts to display the
previously missed short lived processes.

**Alternatives**

Another approach I considered was expanding our use of the `sock_alloc`
kprobe. I used ftrace on a simple curl command to see what other options
could be used (`sudo trace-cmd record -F -p function_graph
http://google.com`). The `socket` syscall calls `sock_alloc`, which
would be another mechanism for accessing the `struct sock`. I decided
against this approach because I don't think its viable to assume that
the same thread/process that calls `socket` will be the one that does
the later syscalls (how our BPF maps are set up). It's common to have a
forking web server model, which means a different process/thread can
call `socket` than the ones that later read/write to it.

**Probe stability**

These probes appear to be stable from our oldest and newest supported
kernel. These functions exist in the
[tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422),
[tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941)
structs and I've seen that other projects and bcc tools use these
probes. This makes me believe that these functions have a pretty well
defined interface.

Relevant Issues: #1829, #1638

Type of change: /kind feature

Test Plan: New tests verify that ipv4 and ipv6 cases work
- [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep
2; done` within a pod and verified that `local_addr` is populated with
this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`
works for pod name resolution.

- [x] Verified the above curl test results in traces without
`local_addr` without this change

![local-addr-testing](https://github.com/user-attachments/assets/344be022-97a0-4096-8af7-8de20d741e40)
- Tested on the following k8s offerings and machine images
- [x] GKE COS and Ubuntu
- [x] EKS Amazon Linux 2

Changelog Message: Populate socket tracer data table `local_addr` and
`local_port` column for client side traces.

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
ddelnano added a commit to ddelnano/pixie that referenced this issue Sep 23, 2024
…1989)

Summary: Populate client side trace's local address via tcp kprobes

This change populates client side trace's `local_addr` and `local_port`
columns for the following use cases:
1. To provide more consistency for the protocol data tables. Having
columns that are empty make it difficult for end users to understand
what is being traced and make them less useful
2. To facilitate addressing a portion of the short lived process
problems (pixie-io#1638)

For 2, the root of the issue is that `df.ctx["pod"]` syntax relies on
the
[px.upid_to_pod_name](https://docs.px.dev/reference/pxl/udf/upid_to_pod_name/)
function. If a PEM misses the short lived process during its metadata
update, this function fails to resolve the pod name. For client side
traces where the pod is making an outbound connection (non localhost),
the `local_addr` column provides an alternative pod name lookup for
short lived processes when the pod is long lived. This means the
following would be equivalent to the `df.ctx["pod"]` lookup:
`px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`.

I intend to follow this PR with a compiler change that will make
`df.ctx["pod"]` try both methods should `px.upid_to_pod_name` fail to
resolve. This will allow the existing pxl scripts to display the
previously missed short lived processes.

**Alternatives**

Another approach I considered was expanding our use of the `sock_alloc`
kprobe. I used ftrace on a simple curl command to see what other options
could be used (`sudo trace-cmd record -F -p function_graph
http://google.com`). The `socket` syscall calls `sock_alloc`, which
would be another mechanism for accessing the `struct sock`. I decided
against this approach because I don't think its viable to assume that
the same thread/process that calls `socket` will be the one that does
the later syscalls (how our BPF maps are set up). It's common to have a
forking web server model, which means a different process/thread can
call `socket` than the ones that later read/write to it.

**Probe stability**

These probes appear to be stable from our oldest and newest supported
kernel. These functions exist in the
[tcp_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv4/tcp_ipv4.c#L2422),
[tcpv6_prot](https://elixir.bootlin.com/linux/v4.14.336/source/net/ipv6/tcp_ipv6.c#L1941)
structs and I've seen that other projects and bcc tools use these
probes. This makes me believe that these functions have a pretty well
defined interface.

Relevant Issues: pixie-io#1829, pixie-io#1638

Type of change: /kind feature

Test Plan: New tests verify that ipv4 and ipv6 cases work
- [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep
2; done` within a pod and verified that `local_addr` is populated with
this change and `px.pod_id_to_pod_name(px.ip_to_pod_id(df.local_addr))`
works for pod name resolution.

- [x] Verified the above curl test results in traces without
`local_addr` without this change

![local-addr-testing](https://github.com/user-attachments/assets/344be022-97a0-4096-8af7-8de20d741e40)
- Tested on the following k8s offerings and machine images
- [x] GKE COS and Ubuntu
- [x] EKS Amazon Linux 2

Changelog Message: Populate socket tracer data table `local_addr` and
`local_port` column for client side traces.

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
GitOrigin-RevId: 0c1fdd2
ddelnano added a commit that referenced this issue Jan 24, 2025
)

Summary: Make metadata pod lookups more resilient to short lived
processes

This is a continuation of the work started from #1989. Since the
`local_addr` column is populated for client side traces, it can be used
as a fallback lookup for these traces. This doesn't solve all of the
permutations of missing short lived processes (#1638), but provides more
coverage than before.

Relevant Issues: #1638

Type of change: /kind bugfix

Test Plan: Verified the following
- [x] Compared the performance with and without this change with
`src/e2e_test/vizier/exectime:exectime`. This change has a minor
performance impact, but it closes the gap on certain situations that
previously caused users to distrust Pixie's instrumentation
```
# Performance baseline
$ ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee baseline_for_simple_udf_swap_e20880ffd.txt
# Performance of this change
./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee simple_udf_swap_cd217c05c.txt
```

[simple_udf_swap_cd217c05c.txt](https://github.com/user-attachments/files/18497709/simple_udf_swap_cd217c05c.txt)

[baseline_for_simple_udf_swap_e20880ffd.txt](https://github.com/user-attachments/files/18497710/baseline_for_simple_udf_swap_e20880ffd.txt)
- [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep
2; done` within a pod and verified that with this change all traces are
shown, without this change a significant number of traces are missed.
See before and after screenshots below:

![vizier-0 14
14-curl-with-missing-data](https://github.com/user-attachments/assets/035b5dcf-d87a-4134-84c1-9e478594927b)

![traces-with-new-fallback](https://github.com/user-attachments/assets/2a84ecbb-83cb-45ae-af85-77b1773efb59)

Changelog Message: Fix a certain class of cases where Pixie previously
missed protocol traces from short lived connections

---------

Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant