Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign telemetry support in Helm chart #6153

Merged
merged 7 commits into from
Feb 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 89 additions & 7 deletions helm/nessie/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,12 +73,6 @@ $ helm uninstall --namespace nessie-ns nessie
| ingress.enabled | bool | `false` | Specifies whether an ingress should be created. |
| ingress.hosts | list | `[{"host":"chart-example.local","paths":[]}]` | A list of host paths used to configure the ingress. |
| ingress.tls | list | `[]` | A list of TLS certificates; each entry has a list of hosts in the certificate, along with the secret name used to terminate TLS traffic on port 443. |
| jaegerTracing.enabled | bool | `false` | Specifies whether jaeger tracing for the nessie server should be enabled. |
| jaegerTracing.endpoint | string | `""` | The traces endpoint, in case the client should connect directly to the Collector, e.g. http://jaeger-collector:14268/api/traces |
| jaegerTracing.publishMetrics | bool | `true` | Whether metrics are published if tracing is enabled. |
| jaegerTracing.samplerParam | int | `1` | The request sampling probability. 1=Sample all requests. Set samplerParam to somewhere between 0 and 1, e.g. 0.50, if you do not wish to sample all requests. |
| jaegerTracing.samplerType | string | `"ratelimiting"` | The sampler type (const, probabilistic, ratelimiting or remote). |
| jaegerTracing.serviceName | string | `"nessie"` | The Jaeger service name. |
| logLevel | string | `"INFO"` | The default logging level for the nessie server. |
| mongodb.connectionString | string | `"mongodb://localhost:27017"` | The MongoDB connection string. |
| mongodb.name | string | `"nessie"` | The MongoDB database name. |
Expand All @@ -92,7 +86,7 @@ $ helm uninstall --namespace nessie-ns nessie
| postgres.secret.name | string | `"postgres-creds"` | The secret name to pull Postgres credentials from. |
| postgres.secret.password | string | `"postgres_password"` | The secret key storing the Postgres password. |
| postgres.secret.username | string | `"postgres_username"` | The secret key storing the Postgres username. |
| replicaCount | int | `1` | The number of replicas to deploy (horizontal scaling). Beware that replicas are stateless; don't set this number > 1 when using ROCKS version store type. |
| replicaCount | int | `1` | The number of replicas to deploy (horizontal scaling). Beware that replicas are stateless; don't set this number > 1 when using INMEMORY or ROCKS version store types. |
| resources | object | `{}` | Configures the resources requests and limits for nessie pods. We usually recommend not to specify default resources and to leave this as a conscious choice for the user. This also increases chances charts run on environments with little resources, such as Minikube. If you do want to specify resources, uncomment the following lines, adjust them as necessary, and remove the curly braces after 'resources:'. |
| rocksdb.selectorLabels | object | `{}` | Labels to add to the persistent volume claim spec selector; a persistent volume with matching labels must exist. Leave empty if using dynamic provisioning. |
| rocksdb.storageClassName | string | `"standard"` | The storage class name of the persistent volume claim to create. |
Expand All @@ -108,6 +102,10 @@ $ helm uninstall --namespace nessie-ns nessie
| serviceMonitor.interval | string | `""` | The scrape interval; leave empty to let Prometheus decide. Must be a valid duration, e.g. 1d, 1h30m, 5m, 10s. |
| serviceMonitor.labels | object | `{}` | Labels for the created ServiceMonitor so that Prometheus operator can properly pick it up. |
| tolerations | list | `[]` | A list of tolerations to apply to nessie pods. See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/. |
| tracing.attributes | object | `{}` | Resource attributes to identify the nessie service among other tracing sources. See https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service. If left empty, traces will be attached to a service named "Nessie"; to change this, provide a service.name attribute here. |
| tracing.enabled | bool | `false` | Specifies whether tracing for the nessie server should be enabled. |
| tracing.endpoint | string | `"http://otlp-collector:4317"` | The collector endpoint URL to connect to (required). The endpoint URL must have either the http:// or the https:// scheme. The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317). See https://quarkus.io/guides/opentelemetry for more information. |
| tracing.sample | string | `"all"` | Which requests should be sampled. Valid values are: "all", "none", or a ratio between 0.0 and 1.0 (inclusive). E.g. 0.5 means that 50% of the requests will be sampled. |
| versionStoreAdvancedConfig | object | `{}` | Advanced version store configuration. The key-value pairs specified here will be passed to the Nessie server as environment variables. See https://projectnessie.org/try/configuration/#version-store-advanced-settings for available properties. Naming convention: to set the property nessie.version.store.advanced.repository-id, use the key: NESSIE_VERSION_STORE_ADVANCED_REPOSITORY_ID. |
| versionStoreType | string | `"INMEMORY"` | Which type of version store to use: INMEMORY, ROCKS, DYNAMO, MONGO, TRANSACTIONAL. |

Expand Down Expand Up @@ -189,6 +187,90 @@ This is broadly following the example from https://kubernetes.io/docs/tasks/acce
* Use the IP from the above output and add it to `/etc/hosts` via `echo "192.168.49.2 chart-example.local" | sudo tee /etc/hosts`
* Verify that `curl chart-example.local` works

### OpenTelemetry Collector with Minikube

* Start Minikube cluster: `minikube start`
* Create K8s Namespace: `kubectl create namespace nessie-ns`
* Install cert-manager:

```bash
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml
```

* Install Jaeger Operator:

```bash
kubectl create namespace observability
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.42.0/jaeger-operator.yaml -n observability
```

If the above command fails with "failed to call webhook [...] connection refused", then cert-manager
was not yet ready. Wait a few seconds and try again.

* Create a Jaeger instance in Nessie's namespace:

```bash
kubectl apply -n nessie-ns -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
EOF
```

If the above command fails with "failed to call webhook [...] connection refused", then the Jaeger
Operator was not yet ready. Wait a few seconds and try again.

* Install Nessie Helm chart with OpenTelemetry Collector enabled:

```bash
helm install nessie -n nessie-ns helm/nessie \
--set tracing.enabled=true \
--set tracing.endpoint=http://jaeger-collector:4317
```

* Forward ports to Jaeger UI and Nessie UI:

```bash
kubectl port-forward -n nessie-ns service/nessie 19120:19120 &
kubectl port-forward -n nessie-ns service/jaeger-query 16686:16686 &
```

* Open the following URLs in your browser:
* Nessie UI (to generate some traces): http://localhost:19120
* Jaeger UI (to retrieve the traces): http://localhost:16686/search

To kill the port forwarding processes, run:

```bash
killall -9 kubectl
```

### Custom Docker images for Nessie with Minikube

You can modify Nessie's code and deploy it to Minikube.

Once you've satisfied with your changes, build the project with:

```bash
./gradlew :nessie-quarkus:quarkusBuild
```

Then build the Docker image and deploy it as follows:

```bash
eval $(minikube docker-env)
docker build -f ./tools/dockerbuild/docker/Dockerfile-jvm -t nessie-test:latest ./servers/quarkus-server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a ./gradlew :nessie-quarkus:quarkusBuild ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that that these Dockerfiles help w/ k8s stuff as well.

```

Then deploy Nessie with the custom Docker image:

```bash
helm install nessie -n nessie-ns helm/nessie \
--set image.repository=nessie-test \
--set image.tag=latest
```

### Stop/Uninstall everything in Dev

```sh
Expand Down
84 changes: 84 additions & 0 deletions helm/nessie/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,90 @@ This is broadly following the example from https://kubernetes.io/docs/tasks/acce
* Use the IP from the above output and add it to `/etc/hosts` via `echo "192.168.49.2 chart-example.local" | sudo tee /etc/hosts`
* Verify that `curl chart-example.local` works

### OpenTelemetry Collector with Minikube

* Start Minikube cluster: `minikube start`
* Create K8s Namespace: `kubectl create namespace nessie-ns`
* Install cert-manager:

```bash
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml
```

* Install Jaeger Operator:

```bash
kubectl create namespace observability
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.42.0/jaeger-operator.yaml -n observability
```

If the above command fails with "failed to call webhook [...] connection refused", then cert-manager
was not yet ready. Wait a few seconds and try again.

* Create a Jaeger instance in Nessie's namespace:

```bash
kubectl apply -n nessie-ns -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
EOF
```

If the above command fails with "failed to call webhook [...] connection refused", then the Jaeger
Operator was not yet ready. Wait a few seconds and try again.

* Install Nessie Helm chart with OpenTelemetry Collector enabled:

```bash
helm install nessie -n nessie-ns helm/nessie \
--set tracing.enabled=true \
--set tracing.endpoint=http://jaeger-collector:4317
Copy link
Member

@dimas-b dimas-b Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work with the otlp/2 protocol (default port 55680)?.. or do we need to configure something else for otlp/2 perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess I don't know, I am going to run a few tests now to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I found:

According to this issue: open-telemetry/opentelemetry-specification#1148, it seems that in its early stages, OpenTelemetry chose to use ports 55680 and 55681 for OTLP/gRPC and OTLP/HTTP respectively.

However, as the discussion in the issue reveals, these choices weren't good ones, because these ports are in the ephemeral range, and consequently, are not suitable to be used as service ports.

So, they asked IANA to reserve instead two other ports: 4317 and 4318, the common ports that we use today for OTLP/gRPC and OTLP/HTTP respectively.

If you look at the specs in 2020, the default endpoint for OTLP/gRPC was localhost:55680.

But if you look at the specs now, the default endpoint for OTLP/gRPC was changed to http://localhost:4317.

As a conclusion, I don't think there is any other alternate port that we should be supporting, only 4317 is nowadays being used for OTLP/gRPC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍 thanks for the detailed explanation!

```

* Forward ports to Jaeger UI and Nessie UI:

```bash
kubectl port-forward -n nessie-ns service/nessie 19120:19120 &
kubectl port-forward -n nessie-ns service/jaeger-query 16686:16686 &
```

* Open the following URLs in your browser:
* Nessie UI (to generate some traces): http://localhost:19120
* Jaeger UI (to retrieve the traces): http://localhost:16686/search

To kill the port forwarding processes, run:

```bash
killall -9 kubectl
```

### Custom Docker images for Nessie with Minikube

You can modify Nessie's code and deploy it to Minikube.

Once you've satisfied with your changes, build the project with:

```bash
./gradlew :nessie-quarkus:quarkusBuild
```

Then build the Docker image and deploy it as follows:

```bash
eval $(minikube docker-env)
docker build -f ./tools/dockerbuild/docker/Dockerfile-jvm -t nessie-test:latest ./servers/quarkus-server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same here)

```

Then deploy Nessie with the custom Docker image:

```bash
helm install nessie -n nessie-ns helm/nessie \
--set image.repository=nessie-test \
--set image.tag=latest
```

### Stop/Uninstall everything in Dev

```sh
Expand Down
2 changes: 1 addition & 1 deletion helm/nessie/ci/inmemory-values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
versionStoreType: INMEMORY
jaegerTracing:
tracing:
enabled: true
11 changes: 11 additions & 0 deletions helm/nessie/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,14 @@ Create the name of the service account to use
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

{{/*
Convert a dict into a string formed by a comma-separated list of key-value pairs: key1=value1,key2=value2, ...
*/}}
{{- define "nessie.dictToString" -}}
{{- $list := list -}}
{{- range $k, $v := . -}}
{{- $list = append $list (printf "%s=%s" $k $v) -}}
{{- end -}}
{{ join "," $list }}
{{- end -}}
41 changes: 22 additions & 19 deletions helm/nessie/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,28 +142,31 @@ spec:
{{- end }}
{{- end }}

{{- if .Values.jaegerTracing.enabled }}
{{- if .Values.jaegerTracing.serviceName }}
- name: QUARKUS_JAEGER_SERVICE_NAME
value: {{ .Values.jaegerTracing.serviceName }}
{{- if .Values.tracing.enabled }}
- name: QUARKUS_OPENTELEMETRY_TRACER_ENABLED
value: "true"
- name: QUARKUS_OPENTELEMETRY_TRACER_EXPORTER_OTLP_ENDPOINT
value: {{ .Values.tracing.endpoint | quote }}
{{- if .Values.tracing.attributes }}
- name: QUARKUS_OPENTELEMETRY_TRACER_RESOURCE_ATTRIBUTES
value: "{{- include "nessie.dictToString" .Values.tracing.attributes }}"
{{- end }}
{{- if .Values.tracing.sample }}
{{ if eq .Values.tracing.sample "all" }}
- name: QUARKUS_OPENTELEMETRY_TRACER_SAMPLER
value: "on"
{{- else if eq .Values.tracing.sample "none" }}
- name: QUARKUS_OPENTELEMETRY_TRACER_SAMPLER
value: "off"
{{- else }}
- name: QUARKUS_OPENTELEMETRY_TRACER_SAMPLER
value: "ratio"
- name: QUARKUS_OPENTELEMETRY_TRACER_SAMPLER_RATIO
value: {{ .Values.tracing.sample | quote }}
{{- end }}
{{- if .Values.jaegerTracing.publishMetrics }}
- name: QUARKUS_JAEGER_METRICS_ENABLED
value: {{ .Values.jaegerTracing.publishMetrics | quote }}
{{- end }}
{{- if .Values.jaegerTracing.samplerType }}
- name: QUARKUS_JAEGER_SAMPLER_TYPE
value: {{ .Values.jaegerTracing.samplerType }}
{{- end }}
{{- if .Values.jaegerTracing.samplerParam }}
- name: QUARKUS_JAEGER_SAMPLER_PARAM
value: {{ .Values.jaegerTracing.samplerParam | quote }}
{{- end }}
{{- if .Values.jaegerTracing.endpoint }}
- name: QUARKUS_JAEGER_ENDPOINT
value: {{ .Values.jaegerTracing.endpoint }}
{{- end }}
{{- end }}

ports:
- name: nessie-server
containerPort: 19120
Expand Down
30 changes: 17 additions & 13 deletions helm/nessie/values.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# -- The number of replicas to deploy (horizontal scaling).
# Beware that replicas are stateless; don't set this number > 1 when using ROCKS version store type.
# Beware that replicas are stateless; don't set this number > 1 when using INMEMORY or ROCKS version store types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

replicaCount: 1

image:
Expand Down Expand Up @@ -97,19 +97,23 @@ authorization:
# allowViewingBranch: op=='VIEW_REFERENCE' && role.startsWith('test_user') && ref.startsWith('allowedBranch')
# allowCommits: op=='COMMIT_CHANGE_AGAINST_REFERENCE' && role.startsWith('test_user') && ref.startsWith('allowedBranch')

jaegerTracing:
# -- Specifies whether jaeger tracing for the nessie server should be enabled.
tracing:
# -- Specifies whether tracing for the nessie server should be enabled.
enabled: false
# -- The traces endpoint, in case the client should connect directly to the Collector, e.g. http://jaeger-collector:14268/api/traces
endpoint: ""
# -- The Jaeger service name.
serviceName: nessie
# -- Whether metrics are published if tracing is enabled.
publishMetrics: true
# -- The sampler type (const, probabilistic, ratelimiting or remote).
samplerType: ratelimiting
# -- The request sampling probability. 1=Sample all requests. Set samplerParam to somewhere between 0 and 1, e.g. 0.50, if you do not wish to sample all requests.
samplerParam: 1
# -- The collector endpoint URL to connect to (required).
# The endpoint URL must have either the http:// or the https:// scheme.
# The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317).
# See https://quarkus.io/guides/opentelemetry for more information.
endpoint: "http://otlp-collector:4317"
# -- Which requests should be sampled. Valid values are: "all", "none", or a ratio between 0.0 and
# 1.0 (inclusive). E.g. 0.5 means that 50% of the requests will be sampled.
sample: all
# -- Resource attributes to identify the nessie service among other tracing sources.
# See https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/#service.
# If left empty, traces will be attached to a service named "Nessie"; to change this, provide a service.name attribute here.
attributes:
{}
# service.name: my-nessie

serviceMonitor:
# -- Specifies whether a ServiceMonitor for Prometheus operator should be created.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,12 @@ quarkus.micrometer.enabled=true
quarkus.micrometer.export.prometheus.enabled=true
quarkus.micrometer.binder.jvm=true

# Trace collection settings
quarkus.opentelemetry.enabled=true
quarkus.opentelemetry.tracer.enabled=true
# The trace collector endpoint URL to connect to.
# Required, except in dev mode where it is set to http://localhost:4317 automatically.
# quarkus.opentelemetry.tracer.exporter.otlp=http://otlp-collector:4317

# Overrides
## dev overrides - dev is used when running Nessie in dev mode `mvn quarkus:dev`
Expand Down
30 changes: 30 additions & 0 deletions site/docs/try/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,36 @@ The following configurations are advanced configurations to configure how Nessie
Metrics are published using prometheus and can be collected via standard methods. See:
[Prometheus](https://prometheus.io).

### Traces

Since Nessie 0.46.0, traces are published using OpenTelemetry. See [Using
OpenTelemetry](https://quarkus.io/guides/opentelemetry) in the Quarkus documentation.

In order for the server to publish its traces, the
`quarkus.opentelemetry.tracer.exporter.otlp.endpoint` property _must_ be set. Its value must be a
valid collector endpoint URL, with either `http://` or `https://` scheme. The collector must talk
the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 3417), e.g.
"http://otlp-collector:4317".
snazy marked this conversation as resolved.
Show resolved Hide resolved

#### Troubleshooting traces

If the server is unable to publish traces, check first for a log warning message like the following:

```
WARN [io.qua.ope.run.exp.otl.LateBoundBatchSpanProcessor] (vert.x-eventloop-thread-5) No BatchSpanProcessor delegate specified, no action taken.
```

This means that the `quarkus.opentelemetry.tracer.exporter.otlp.endpoint` property is not set. Set
it to a valid OTLP connector URL and try again.

If you see a log error message like the following:

```
SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317
```

This means that the server is unable to connect to the collector. Check that the collector is
running and that the URL is correct.

### Swagger UI
The Swagger UI allows for testing the REST API and reading the API docs. It is available
Expand Down