Skip to content

Commit

Permalink
README / markdown updates (#621)
Browse files Browse the repository at this point in the history
refresh a few outdated things in our readme and descriptions
  • Loading branch information
jotak authored Apr 18, 2024
1 parent e6d3c52 commit b26356d
Show file tree
Hide file tree
Showing 5 changed files with 81 additions and 50 deletions.
6 changes: 3 additions & 3 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ What matters is the version of the Linux kernel: 4.18 or more is supported. Earl

Other than that, there are no known restrictions on the Kubernetes version.

### To use IPFIX exports
### To use CNI's IPFIX exports

OpenShift 4.10 or above, or upstream OVN-Kubernetes, are recommended, as the operator will configure OVS for you.
This feature has been deprecated and is not available anymore. Flows are now always generated by the eBPF agent.

For other CNIs, you need to find out if they can export IPFIX, and configure them accordingly.
Note that NetObserv itself is still able to export its enriched flows as IPFIX: that can be done by configuring `spec.exporters`.

### To get the OpenShift Console plugin

Expand Down
47 changes: 20 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,15 @@
[![Go Report Card](https://goreportcard.com/badge/github.com/netobserv/network-observability-operator)](https://goreportcard.com/report/github.com/netobserv/network-observability-operator)

NetObserv Operator is a Kubernetes / OpenShift operator for network observability. It deploys a monitoring pipeline that consists in:
- the NetObserv eBPF agent, that generates network flows from captured packets
- Flowlogs-pipeline, a component that collects, enriches and exports these flows.
- When used in OpenShift, a Console plugin for flows visualization with powerful filtering options, a topology representation and more.
- an eBPF agent, that generates network flows from captured packets
- flowlogs-pipeline, a component that collects, enriches and exports these flows
- when used in OpenShift, a Console plugin for flows visualization with powerful filtering options, a topology representation and more

Flow data is then available in multiple ways, each optional:

- As Prometheus metrics
- As raw flow logs stored in Loki
- As raw flow logs exported to a collector

## Getting Started

Expand All @@ -18,7 +24,7 @@ NetObserv Operator is available in [OperatorHub](https://operatorhub.io/operator

![OpenShift OperatorHub search](./docs/assets/operatorhub-search.png)

Please read the [operator description in OLM](./config/descriptions/upstream.md). You will need to install Loki, some instructions are provided there.
Please read the [operator description in OLM](./config/descriptions/upstream.md).

After the operator is installed, create a `FlowCollector` resource:

Expand All @@ -35,7 +41,7 @@ git clone https://github.com/netobserv/network-observability-operator.git && cd
USER=netobserv make deploy deploy-loki deploy-grafana
```

It will deploy the operator in its latest version, with port-forwarded Loki and Grafana.
It will deploy the operator in its latest version, with port-forwarded Loki and Grafana (they are optional).

> Note: the `loki-deploy` script is provided as a quick install path and is not suitable for production. It deploys a single pod, configures a 10GB storage PVC, with 24 hours of retention. For a scalable deployment, please refer to [our distributed Loki guide](https://github.com/netobserv/documents/blob/main/loki_distributed.md) or [Grafana's official documentation](https://grafana.com/docs/loki/latest/).
Expand All @@ -55,17 +61,9 @@ kubectl edit flowcollector cluster

Refer to the [Configuration section](#configuration) of this document.

#### Install older versions

To deploy a specific version of the operator, you need to switch to the related git branch, then add a `VERSION` env to the above make command, e.g:

```bash
git checkout 1.0.4
USER=netobserv VERSION=1.0.4 make deploy deploy-loki deploy-grafana
make deploy-sample-cr
```
### With or without Loki?

Beware that the version of the underlying components, such as flowlogs-pipeline, may be tied to the version of the operator (this is why we recommend switching the git branch). Breaking this correlation may result in crashes. The versions of the underlying components are defined in the `FlowCollector` resource as image tags.
Historically, Grafana Loki was a strict dependency but it isn't anymore. If you don't want to install it, you can still get the Prometheus metrics, and/or export raw flows to a custom collector. But be aware that some of the Console plugin features will be disabled. For instance, you will not be able to view raw flows there, and the metrics / topology will have a more limited level of details, missing information such as pods or IPs.

### OpenShift Console

Expand Down Expand Up @@ -100,15 +98,6 @@ These views are accessible directly from the main menu, and also as contextual t

![Contextual topology](./docs/assets/topology-pod.png)


### Grafana

Grafana can be used to retrieve and show the collected flows from Loki. If you used the `make` commands provided above to install NetObserv from the repository, you should already have Grafana installed and configured with Loki data source. Otherwise, you can install Grafana by following the instructions [here](https://github.com/netobserv/documents/blob/main/hack_loki.md#grafana), and add a new Loki data source that matches your setup. If you used the provided quick install path for Loki, its access URL is `http://loki:3100`.

To get dashboards, import [this file](./config/samples/dashboards/Network%20Observability.json) into Grafana. It includes a table of the flows and some graphs showing the volumetry per source or destination namespaces or workload:

![Grafana dashboard](./docs/assets/netobserv-grafana-dashboard.png)

## Configuration

The `FlowCollector` resource is used to configure the operator and its managed components. A comprehensive documentation is [available here](./docs/FlowCollector.md), and a full sample file [there](./config/samples/flows_v1beta2_flowcollector.yaml).
Expand All @@ -119,15 +108,15 @@ To edit configuration in cluster, run:
kubectl edit flowcollector cluster
```

As it operates cluster-wide, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
As it operates cluster-wide on every node, only a single `FlowCollector` is allowed, and it has to be named `cluster`.

A couple of settings deserve special attention:

- Agent features (`spec.agent.ebpf.features`) can enable more features such as tracking packet drops, TCP latency (RTT) and DNS requests and responses.

- Sampling `spec.agent.ebpf.sampling`: a value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics, but the higher amount of resources are consumed. By default, sampling is set to 50 (ie. 1:50). Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.

- Loki (`spec.loki`): configure here how to reach Loki. The default URL values match the Loki quick install paths mentioned in the _Getting Started_ section, but you may have to configure differently if you used another installation method. You will find more information in our guides for deploying Loki: [with Loki Operator](https://github.com/netobserv/documents/blob/main/loki_operator.md), or an alternative ["distributed Loki" guide](https://github.com/netobserv/documents/blob/main/loki_distributed.md). You should set `spec.loki.mode` according to the chosen installation method, for instance use `LokiStack` if you use the Loki Operator.
- Loki (`spec.loki`): configure here how to reach Loki. The default URL values match the Loki quick install paths mentioned in the _Getting Started_ section, but you may have to configure differently if you used another installation method. You will find more information in our guides for deploying Loki: [with Loki Operator](https://github.com/netobserv/documents/blob/main/loki_operator.md), or an alternative ["distributed Loki" guide](https://github.com/netobserv/documents/blob/main/loki_distributed.md). You should set `spec.loki.mode` according to the chosen installation method, for instance use `LokiStack` if you use the Loki Operator. Make sure to disable Loki (`spec.loki.enable`) if you don't want to use it.

- Quick filters (`spec.consolePlugin.quickFilters`): configure preset filters to be displayed in the Console plugin. They offer a way to quickly switch from filters to others, such as showing / hiding pods network, or infrastructure network, or application network, etc. They can be tuned to reflect the different workloads running on your cluster. For a list of available filters, [check this page](./docs/QuickFilters.md).

Expand All @@ -137,6 +126,10 @@ A couple of settings deserve special attention:

- To enable availability zones awareness, set `spec.processor.addZone` to `true`.

### Metrics

More information on Prometheus metrics is available in a dedicated page: [Metrics.md](./docs/Metrics.md).

### Performance fine-tuning

In addition to sampling and using Kafka or not, other settings can help you get an optimal setup without compromising on the observability.
Expand All @@ -145,7 +138,7 @@ Here is what you should pay attention to:

- Resource requirements and limits (`spec.agent.ebpf.resources`, `spec.agent.processor.resources`): adapt the resource requirements and limits to the load and memory usage you expect on your cluster. The default limits (800MB) should be sufficient for most medium sized clusters. You can read more about reqs and limits [here](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/).

- eBPF agent's cache max flows (`spec.agent.ebpf.cacheMaxFlows`) and timeout (`spec.agent.ebpf.cacheActiveTimeout`) control how often flows are reported by the agents. The higher are `cacheMaxFlows` and `cacheActiveTimeout`, the less traffic will be generated by the agents themselves, which also ties with less CPU load. But on the flip side, it leads to a slightly higher memory consumption, and might generate more latency in the flow collection.
- eBPF agent's cache max flows (`spec.agent.ebpf.cacheMaxFlows`) and timeout (`spec.agent.ebpf.cacheActiveTimeout`) control how often flows are reported by the agents. The higher are `cacheMaxFlows` and `cacheActiveTimeout`, the less traffic will be generated by the agents themselves, which also ties with less CPU load. But on the flip side, it leads to a slightly higher memory consumption, and might generate more latency in the flow collection. There is [a blog entry](https://github.com/netobserv/documents/blob/main/blogs/agent_metrics_perf/index.md) dedicated to this fine-tuning.

- It is possible to reduce the overall observed traffic by restricting or excluding interfaces via `spec.agent.ebpf.interfaces` and `spec.agent.ebpf.excludeInterfaces`. Note that the interface names may vary according to the CNI used.

Expand Down
26 changes: 19 additions & 7 deletions bundle/manifests/netobserv-operator.clusterserviceversion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -784,15 +784,21 @@ spec:
version: v1alpha1
description: |-
NetObserv Operator is an OpenShift / Kubernetes operator for network observability. It deploys a monitoring pipeline that consists in:
- the NetObserv eBPF agent, that generates network flows from captured packets
- Flowlogs-pipeline, a component that collects, enriches and exports these flows.
- When used in OpenShift, a Console plugin for flows visualization with powerful filtering options, a topology representation and more.
- an eBPF agent, that generates network flows from captured packets
- flowlogs-pipeline, a component that collects, enriches and exports these flows
- when used in OpenShift, a Console plugin for flows visualization with powerful filtering options, a topology representation and more
Flow data is then available in multiple ways, each optional:
- As Prometheus metrics
- As raw flow logs stored in Grafana Loki
- As raw flow logs exported to a collector
## Dependencies
### Loki
[Loki](https://grafana.com/oss/loki/), from GrafanaLabs, is the backend that is used to store all collected flows. The NetObserv Operator does not install Loki directly, however we provide some guidance to help you there.
[Loki](https://grafana.com/oss/loki/), from GrafanaLabs, can optionally be used as the backend to store all collected flows. The NetObserv Operator does not install Loki directly, however we provide some guidance to help you there.
For normal usage, we recommend two options:
Expand All @@ -808,6 +814,9 @@ spec:
kubectl apply -f <(curl -L https://raw.githubusercontent.com/netobserv/documents/5410e65b8e05aaabd1244a9524cfedd8ac8c56b5/examples/zero-click-loki/2-loki.yaml) -n netobserv
```
If you prefer to not use Loki, you must set `spec.loki.enable` to `false` in `FlowCollector`.
In that case, you can still get the Prometheus metrics or export raw flows to a custom collector. But be aware that some of the Console plugin features will be disabled. For instance, you will not be able to view raw flows there, and the metrics / topology will have a more limited level of details, missing information such as pods or IPs.
### Kafka
[Apache Kafka](https://kafka.apache.org/) can optionally be used for a more resilient and scalable architecture. You can use for example [Strimzi](https://strimzi.io/), an operator-based distribution of Kafka for Kubernetes and OpenShift.
Expand All @@ -818,35 +827,38 @@ spec:
## Configuration
The `FlowCollector` resource is used to configure the operator and its managed components. A comprehensive documentation is [available here](https://github.com/netobserv/network-observability-operator/blob/1.0.5/docs/FlowCollector.md), and a full sample file [there](https://github.com/netobserv/network-observability-operator/blob/1.0.5/config/samples/flows_v1beta1_flowcollector.yaml).
The `FlowCollector` resource is used to configure the operator and its managed components. A comprehensive documentation is [available here](https://github.com/netobserv/network-observability-operator/blob/1.0.5/docs/FlowCollector.md), and a full sample file [there](https://github.com/netobserv/network-observability-operator/blob/1.0.5/config/samples/flows_v1beta2_flowcollector.yaml).
To edit configuration in cluster, run:
```bash
kubectl edit flowcollector cluster
```
As it operates cluster-wide, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
As it operates cluster-wide on every node, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
A couple of settings deserve special attention:
- Sampling (`spec.agent.ebpf.sampling`): a value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics, but the higher amount of resources are consumed. By default, sampling is set to 50 (ie. 1:50). Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.
- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you might have to configure differently if you used another installation method.
- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you might have to configure differently if you used another installation method. Make sure to disable it (`spec.loki.enable`) if you don't want to use Loki.
- Quick filters (`spec.consolePlugin.quickFilters`): configure preset filters to be displayed in the Console plugin. They offer a way to quickly switch from filters to others, such as showing / hiding pods network, or infrastructure network, or application network, etc. They can be tuned to reflect the different workloads running on your cluster. For a list of available filters, [check this page](https://github.com/netobserv/network-observability-operator/blob/1.0.5/docs/QuickFilters.md).
- Kafka (`spec.deploymentModel: KAFKA` and `spec.kafka`): when enabled, integrates the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Kafka can provide better scalability, resiliency and high availability ([view more details](https://www.redhat.com/en/topics/integration/what-is-apache-kafka)). Assumes Kafka is already deployed and a topic is created.
- Exporters (`spec.exporters`) an optional list of exporters to which to send enriched flows. KAFKA and IPFIX exporters are supported. This allows you to define any custom storage or processing that can read from Kafka or use the IPFIX standard.
- To enable availability zones awareness, set `spec.processor.addZone` to `true`.
## Further reading
Please refer to the documentation on GitHub for more information.
This documentation includes:
- An [overview](https://github.com/netobserv/network-observability-operator#openshift-console) of the features, with screenshots
- More information on [configuring metrics](https://github.com/netobserv/network-observability-operator/blob/1.0.5/docs/Metrics.md).
- A [performance](https://github.com/netobserv/network-observability-operator#performance-fine-tuning) section, for fine-tuning
- A [security](https://github.com/netobserv/network-observability-operator#securing-data-and-communications) section
- An [F.A.Q.](https://github.com/netobserv/network-observability-operator#faq--troubleshooting) section
Expand Down
Loading

0 comments on commit b26356d

Please sign in to comment.