Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: fixes on new persistent enforcement and cgroup rate pages #2760

Merged
merged 1 commit into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 96 additions & 81 deletions docs/content/en/docs/concepts/cgroup-rate.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Cgroup rate throtling"
weight: 2
title: "Event throttling"
weight: 5
description: "Monitor and throttle cgroup events rate"
---

Expand All @@ -21,14 +21,17 @@ The throttle action generates following events:
- `THROTTLE` start event is sent when the group rate limit is crossed
- `THROTTLE` stop event is sent when the cgroup rate is again below the limit stable for 5 seconds

**NOTE** The threshold for given cgroup is monitored *per CPU*.
{{< note >}}
The threshold for given cgroup is monitored *per CPU*.
When the events are spread around on multiple CPUs we will throttle
them per CPU only if they cross the threshold on that CPU.
{{< /note >}}

**NOTE** At the moment we monitor and limit base sensor events:
{{< note >}}
At the moment we monitor and limit base sensor events:
- `PROCESS_EXEC`
- `PROCESS_EXIT`

{{< /note >}}

## Setup

Expand Down Expand Up @@ -63,95 +66,107 @@ The throttle events contains fields as follows.

- `THROTTLE_START`

```json
{
"process_throttle": {
"type": "THROTTLE_START",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:43.178407128Z"
}
```
```json
{
"process_throttle": {
"type": "THROTTLE_START",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:43.178407128Z"
}
```

- `THROTTLE_STOP`

```json
"process_throttle": {
"type": "THROTTLE_STOP",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:55.501718877Z"
```
```json
{
"process_throttle": {
"type": "THROTTLE_STOP",
"cgroup": "session-429.scope"
},
"node_name": "ubuntu-22",
"time": "2024-07-26T13:07:55.501718877Z"
}
```


## Example

This example shows how to generate throttle events when cgroup rate monitoring is enabled.


- Start tetragon with cgroup rate monitoring 10 events per second, the successfull configuration will show in tetragon log

```
# tetragon --bpf-lib ./bpf/objs/ --cgroup-rate=10,1s
...
time="2024-07-26T13:33:19Z" level=info msg="Cgroup rate started (10/1s)"
...
```

- Spawn more than 10 events per second

```
$ while :; do sleep 0.001s; done
```

- Monitor events shows throttling


```
$ tetra getevents -o compact
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🧬 throttle START session-429.scope
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s

🧬 throttle STOP session-429.scope
```

When you stop the while loop from thr other terminal you will get above `throttle STOP` event after 5 seconds.
1. Start tetragon with cgroup rate monitoring 10 events per second.

```shell
tetragon --bpf-lib ./bpf/objs/ --cgroup-rate=10,1s
```

The successful configuration will show in tetragon log.

```
...
time="2024-07-26T13:33:19Z" level=info msg="Cgroup rate started (10/1s)"
...
```

1. Spawn more than 10 events per second.

```shell
while :; do sleep 0.001s; done
```

1. Monitor events shows throttling.


```shell
tetra getevents -o compact
```

The output should be similar to:

```
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🧬 throttle START session-429.scope
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s
💥 exit ubuntu-22 /usr/bin/sleep 0.001s 0
🚀 process ubuntu-22 /usr/bin/sleep 0.001s

🧬 throttle STOP session-429.scope
```

When you stop the while loop from the other terminal you will get above
`throttle STOP` event after 5 seconds.


## Limitations

- The cgroup rate is monitored per CPU

- At the moment we monitor and limit base sensor and kprobe events:
- At the moment we only monitor and limit base sensor and kprobe events:
- `PROCESS_EXEC`
- `PROCESS_EXIT`

100 changes: 100 additions & 0 deletions docs/content/en/docs/concepts/enforcement/persistent-enforcement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
title: "Persistent enforcement"
weight: 1
description: "How to configure persistent enforcement"
---

This page shows you how to configure persistent enforcement.

## Concept

The idea of persistent enforcement is to allow the enforcement policy to continue
running even when its tetragon process is gone.

This is configured with the `--keep-sensors-on-exit` option.

When the tetragon process exits, the policy stays active because it's pinned
in sysfs bpf tree under `/sys/fs/bpf/tetragon` directory.

When a new tetragon process is started, it performs the following actions:

- checks if there's existing `/sys/fs/bpf/tetragon` and moves it to
`/sys/fs/bpf/tetragon_old` directory;
- sets up configured policy;
- removes `/sys/fs/bpf/tetragon_old` directory.

## Example

This example shows how the persistent enforcement works on simple tracing policy.

1. Consider the following enforcement tracing policy that kills any process that touches `/tmp/tetragon` file.

```yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "enforcement"
spec:
kprobes:
- call: "fd_install"
syscall: false
args:
- index: 0
type: int
- index: 1
type: "file"
selectors:
- matchArgs:
- index: 1
operator: "Equal"
values:
- "/tmp/tetragon"
matchActions:
- action: Sigkill
```

1. Spawn tetragon with the above policy and `--keep-sensors-on-exit` option.

```shell
tetragon --bpf-lib bpf/objs/ --keep-sensors-on-exit --tracing-policy enforcement.yaml
```

1. Verify that the enforcement policy is in place.

```shell
cat /tmp/tetragon
```

The output should be similar to

```
Killed
```

1. Kill tetragon with <kbd>CTRL+C</kbd>.

```
time="2024-07-26T14:47:45Z" level=info msg="Perf ring buffer size (bytes)" percpu=68K total=272K
time="2024-07-26T14:47:45Z" level=info msg="Perf ring buffer events queue size (events)" size=63K
time="2024-07-26T14:47:45Z" level=info msg="Listening for events..."
^C
time="2024-07-26T14:50:50Z" level=info msg="Received signal interrupt, shutting down..."
time="2024-07-26T14:50:50Z" level=info msg="Listening for events completed." error="context canceled"
```

1. Verify that the enforcement policy is **STILL** in place.

```shell
cat /tmp/tetragon
```

The output should be still similar to

```
Killed
```

## Limitations

At the moment we are not able to receive any events during the tetragon down time,
only the the enforcement is in place.
Loading