Skip to content

Commit

Permalink
fix: robust runtime details detection (#1513)
Browse files Browse the repository at this point in the history
This PR introduces a new way to apply runtime detection.

## Motivation

Our current language detection has the following issues:
- triggering the runtime detection is done via setting the
`runtimeDetailsInvalidated` property on the `InstrumentationConfigSpec`
CR. Since this resource is being written to by both the sdk configs
controller in instromentor, and the odiglets once they finish runtime
detection, the ping-pong is venerable to race conditions and is very
hard to reason about.
- we trigger runtime inspection only once, possibly missing future
changes in the following info: `runtimeVersion`, `envVars` which does
not represent the current state of the workload. since odigos itself
touches some env variables, this inconsistent can lead to really hard to
find bugs. having the runtime detection up to date when new generations
of the workload are applied, helps eliminate edge and advanced
use-cases.
- we currently have 2 CRDs to represent instrumented workload in odigos:
`instrumentationconfig` and `instrumentedapplication`. They can be
merged into one CRD, making the maintenance a bit simpler.
- the env override mechanism currently relies on the env values that are
detected from runtime inspection to patch it with odigos content. For
instrumentor to allow manifest changes in the env vars, it required to
know if the runtime detection is up-to-date and avoid updating old
values. Capturing the workload generation against which the runtime
details were calculated, allows it to ignore this data when not up to
date.

## Objectives

- collect runtime details for new generations of the workload
(recalculate when a pod with newer generation starts)
- avoid calculating runtime details if the inspection is already made
for this generation before.
- Do not pull in any new objects into the contoller-runtime cache,
possibly blowing up the memory and cpu.
- apply the change in small steps, only changing one thing at a time for
smaller PRs and safer transit

## Changes

- Write the new runtime inspection info into the status of the
`instrumentationconfig` object. It will live alongside the existing
runtime detection, and is not used anywhere in this step. We can then
use it to compare the two values to gain confident in the process before
start using this info.
- Add 2 new controllers to `odiglet`. one for detecting the language
when a pod becomes running only if the geeneration in the CR requires to
calculate again. the second one is for instrumentationconfig to do the
initial language detection when a new source is added in odigos.
- For generation calculation, we need to get the replicaset for pods
that were generated from deployments. For that we need to add the
relevant RBAC permissions. the reeeplicaset is accessed only from a
clientset, so not to pull all cluster replicasets into the
controller-runtime cache.
- enhanced the e2e tests to assert the values we expect in existing
tests
  • Loading branch information
blumamir authored Sep 22, 2024
1 parent 9dd1fae commit cfac2b5
Show file tree
Hide file tree
Showing 27 changed files with 1,090 additions and 35 deletions.
16 changes: 15 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -194,12 +194,16 @@ cli-install:
@echo "Installing odigos from source. version: $(ODIGOS_CLI_VERSION)"
cd ./cli ; go run -tags=embed_manifests . install --version $(ODIGOS_CLI_VERSION)


.PHONY: cli-upgrade
cli-upgrade:
@echo "Upgrading odigos from source. version: $(ODIGOS_CLI_VERSION)"
cd ./cli ; go run -tags=embed_manifests . upgrade --version $(ODIGOS_CLI_VERSION) --yes

.PHONY: cli-build
cli-build:
@echo "Building the cli executable for tests"
cd cli && go build -tags=embed_manifests -o odigos .

.PHONY: cli-diagnose
cli-diagnose:
@echo "Diagnosing cluster data for debugging"
Expand All @@ -212,3 +216,13 @@ api-all:
.PHONY: crd-apply
crd-apply: api-all cli-upgrade
@echo "Applying changes to CRDs in api directory"

.PHONY: dev-tests-kind-cluster
dev-tests-kind-cluster:
@echo "Creating a kind cluster for development"
kind delete cluster
kind create cluster

.PHONY: dev-tests-setup
dev-tests-setup: TAG := e2e-test
dev-tests-setup: dev-tests-kind-cluster cli-build build-images load-to-kind
49 changes: 49 additions & 0 deletions api/config/crd/bases/odigos.io_instrumentationconfigs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,55 @@ spec:
type: object
type: array
type: object
status:
properties:
observedWorkloadGeneration:
description: |-
Runtime detection is applied on pods.
Pods run a specific workload template spec, so it's important to capture it do avoid
unpredictable behavior when multiple generations co-exist,
and to avoid running the detection when unnecessary.
format: int64
type: integer
runtimeDetailsByContainer:
description: Capture Runtime Details for the workloads that this CR
applies to.
items:
properties:
containerName:
type: string
envVars:
items:
properties:
name:
type: string
value:
type: string
required:
- name
- value
type: object
type: array
language:
enum:
- java
- python
- go
- dotnet
- javascript
- mysql
- nginx
- unknown
- ignored
type: string
runtimeVersion:
type: string
required:
- containerName
- language
type: object
type: array
type: object
type: object
served: true
storage: true
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions api/generated/odigos/applyconfiguration/utils.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 13 additions & 1 deletion api/odigos/v1alpha1/instrumentationconfig_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,19 @@ type InstrumentationConfig struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec InstrumentationConfigSpec `json:"spec,omitempty"`
Spec InstrumentationConfigSpec `json:"spec,omitempty"`
Status InstrumentationConfigStatus `json:"status,omitempty"`
}

type InstrumentationConfigStatus struct {
// Capture Runtime Details for the workloads that this CR applies to.
RuntimeDetailsByContainer []RuntimeDetailsByContainer `json:"runtimeDetailsByContainer,omitempty"`

// Runtime detection is applied on pods.
// Pods run a specific workload template spec, so it's important to capture it do avoid
// unpredictable behavior when multiple generations co-exist,
// and to avoid running the detection when unnecessary.
ObservedWorkloadGeneration int64 `json:"observedWorkloadGeneration,omitempty"`
}

// Config for the OpenTelemeetry SDKs that should be applied to a workload.
Expand Down
23 changes: 23 additions & 0 deletions api/odigos/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions cli/cmd/resources/odiglet.go
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,15 @@ func NewOdigletClusterRole(psp bool) *rbacv1.ClusterRole {
"daemonsets/finalizers",
},
},
{
Verbs: []string{
"get",
},
APIGroups: []string{"apps"},
Resources: []string{
"replicasets",
},
},
{
Verbs: []string{
"create",
Expand Down Expand Up @@ -267,6 +276,18 @@ func NewOdigletClusterRole(psp bool) *rbacv1.ClusterRole {
"instrumentationconfigs",
},
},
{
Verbs: []string{
"get",
"list",
"watch",
"patch",
},
APIGroups: []string{"odigos.io"},
Resources: []string{
"instrumentationconfigs/status",
},
},
},
}

Expand Down
15 changes: 15 additions & 0 deletions helm/odigos/templates/odiglet/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,21 @@ rules:
- get
- list
- watch
- apiGroups:
- odigos.io
resources:
- instrumentationconfigs/status
verbs:
- get
- list
- watch
- patch
- apiGroups:
- apps
resources:
- replicasets
verbs:
- get
- apiGroups:
- odigos.io
resources:
Expand Down
2 changes: 1 addition & 1 deletion odiglet/cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ func main() {
os.Exit(-1)
}

err = kube.SetupWithManager(mgr, ebpfDirectors)
err = kube.SetupWithManager(mgr, ebpfDirectors, clientset)
if err != nil {
log.Logger.Error(err, "Failed to setup controller-runtime manager")
os.Exit(-1)
Expand Down
5 changes: 3 additions & 2 deletions odiglet/pkg/kube/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (
"k8s.io/apimachinery/pkg/fields"
"k8s.io/apimachinery/pkg/runtime"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
"k8s.io/client-go/kubernetes"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/client"
Expand Down Expand Up @@ -71,8 +72,8 @@ func StartManager(ctx context.Context, mgr ctrl.Manager) error {
return nil
}

func SetupWithManager(mgr ctrl.Manager, ebpfDirectors ebpf.DirectorsMap) error {
err := runtime_details.SetupWithManager(mgr)
func SetupWithManager(mgr ctrl.Manager, ebpfDirectors ebpf.DirectorsMap, clientset *kubernetes.Clientset) error {
err := runtime_details.SetupWithManager(mgr, clientset)
if err != nil {
return err
}
Expand Down
Loading

0 comments on commit cfac2b5

Please sign in to comment.