Add manual instrumentation support #352

pellared · 2023-09-27T07:16:22Z

Is your feature request related to a problem? Please describe.

One of the key aspects of observability is to have the ability to create instruments which are oriented around the application logic. Without this possibility the automatic instrumentation is more like "monitoring" tool.

Describe the solution you'd like

Add the application developer the possibility to augment the instrumentation in the application so that they can:

create custom spans (and tracers)
create custom metrics (and meters)
attach additional instrumentation libraries

For example, let's consider this application which is based on https://opentelemetry.io/docs/instrumentation/go/getting-started/:

package main

import (
	"io"
	"log"
	"math/rand"
	"net/http"
	"strconv"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/metric"
)

func main() {
	http.HandleFunc("/rolldice", rolldice)

	log.Fatal(http.ListenAndServe(":8080", nil))
}

var (
	tracer  = otel.Tracer("rolldice")
	meter   = otel.Meter("rolldice")
	rollCnt metric.Int64Counter
)

func init() {
	var err error
	rollCnt, err = meter.Int64Counter("dice.rolls",
		metric.WithDescription("The number of rolls by roll value"),
		metric.WithUnit("{roll}"))
	if err != nil {
		panic(err)
	}
}

func rolldice(w http.ResponseWriter, r *http.Request) {
	ctx, span := tracer.Start(r.Context(), "roll")
	defer span.End()

	roll := 1 + rand.Intn(6)

	rollValueAttr := attribute.Int("roll.value", roll)
	span.SetAttributes(rollValueAttr)
	rollCnt.Add(ctx, 1, metric.WithAttributes(rollValueAttr))

	resp := strconv.Itoa(roll) + "\n"
	if _, err := io.WriteString(w, resp); err != nil {
		log.Printf("Write failed: %v\n", err)
	}
}

Key requirements:

The telemetry recorded in the application code (roll spans and dice.rolls metrics) is exported recorded and exported via the tracer provider and meter provider that is set up by Go Automatic Instrumentation.
The telemetry recorded in the application code is related (as child) with the spans created by Go Automatic Instrumentation.

So the manual telemetry should be integrated with the telemetry produced by Go Automatic Instrumentation.

As part of this issue it would be also good to update https://github.com/open-telemetry/opentelemetry-go-instrumentation/tree/main/examples/rolldice/main.go with the code above.

Additional context

The need for this feature in .NET Automatic Instrumentation has resulted in complete redesign and code rewrite. Docs for reference: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/docs/manual-instrumentation.md

Without this feature automatic instrumentation would be only a "poor mans" instrumentation as it would not be able to compete with the manual OTel Go SDK approach.

Here are some docs around manual instrumentation in this repository: https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/main/docs/design/manual-instrumentation.md

The text was updated successfully, but these errors were encountered:

pellared · 2023-09-27T07:29:28Z

If the following is true:

opentelemetry-go-instrumentation/docs/design/context-propagation.md

Line 22 in d433218

    
           Notice that currently, the automatic instrumentation correlates spans to the same trace if they are being executed by the same goroutine. In the future we plan to implement a more robust tracking of the goroutine tree to support traces from multiple goroutines. As part of this planned change, the current implementation of context propagation will also have to be changed (different key in the current span map).

Then I think this issue would be only about updating the documentation and tests.

MikeGoldsmith · 2023-10-10T10:36:41Z

Hey @pellared - thanks for creating the issue. Yes, we agree that the agent should support mixing automatic and manual instrumentation. As you found with .NET's implementation, without it the usefulness of this project is very limited as many users will want to graduate to a higher level of instrumentation that would require manually creating spans to capture additional business logic.

I see this as hard requirement for this project to move into beta because it could affect some core functionality within the current agent.

MrAlias · 2023-10-10T14:02:23Z

What is the problem here? Just that the instrumentation currently sends telemetry to two different pipelines (TracerProviders)?

pellared · 2023-10-10T15:47:02Z

Just that the instrumentation currently sends telemetry to two different pipelines (TracerProviders)?

Yes. I think that the Go automatic instrumentation should set the global providers to decrease the amount of coding required to add additional custom telemetry.

For reference, this is what https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation does. See https://opentelemetry.io/docs/instrumentation/net/getting-started/

RonFed · 2023-11-12T10:54:33Z

I started implementing some of the required functionality for this issue and would like to hear your thoughts. I'm using the above roll-dice example for my testing.
The main idea is to add eBPF probes to start-span and end-span related methods in the go SDK.

Currently I added probes to the following functions:
go.opentelemetry.io/otel/sdk/trace.(*tracer).Start
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End
This means that the current implementation will only work if the application sets up the SDK.

I looked at instrumenting nonRecordingSpan as well but because of the nature of its methods instrumenting them is not useful since:
func (nonRecordingSpan) End(...trace.SpanEndOption) {}
From the eBPF point of view, the span object is not reachable, and we don't have a way of knowing which span we are working on as opposed to func (s *recordingSpan) End(options ...trace.SpanEndOption) .
If the signature of the nonRecordingSpan methods were to change to pass the pointer, then we could construct the eBPF-equilent span for those as well.

Another difference is that SetAttributes does nothing for nonRecordingSpan, so it'll require instrumenting that method as well and saving the attributes, as opposed to the recordingSpan , in which we can extract the attributes from the span struct once End is called.

So I guess the main question is how important it is to support nonRecordingSpan? If I understood correctly, in order to support applications which don't set up the SDK rather just calling start span and end span, we need to support nonRecordingSpan.

@pellared @MrAlias

MrAlias · 2023-11-13T20:48:27Z

I don't think we should be overwriting the otel functionality. In fact, I don't think we should attempt to interact with the OTel setup in the target binary. It is a separate process that likely has a separate setup. By overriding some pipeline in that binary we are almost certainly going to cause confusion.

For example, if a user has instrumented their app directly with a TracerProvider that they intend to be different from the auto-instrumentation, if we "unify" the pipelines for them it would be unexpected behavior.

This would also be an overreach. Observability backends are designed to solve this problem. If we can have the net/http auto-instrumentation correctly inject the trace information into the context.Context returned from Request.Context() then the user should able to correctly create a child span in the trace. From there any observability system should have no problem tying the spans into the same trace. That is the purpose of the backend.

From some ad-hoc testing, it appears we do not currently inject the span context into the context returned by Request.Context. I think to resolve this we need to start there.

Similarly for the grpc instrumentation, a way to get the auto-instrumented span context will need to be made available to the user.

pellared · 2023-11-14T09:43:10Z

I think that the Go Automatic Instrumentation should "by default" set the global providers and use them as well.

On top of it, we should add a possibility to:

Disable bootstrapping of the providers by Go Automatic Instrumentation via option and env var - so that the users can set the global providers on their own. Go Automatic Instrumentation would be then used mainly to get the instrumentation libraries thanks to eBPF instead of manually using e.g. otelhttp and otelgrpc.
Provide With*Provider options so that Go Automatic Instrumentation uses the provided signal providers instead of the globals. I am not sure if users would need it, but it could be useful for our testing.

From my experience based on https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation 99% of use cases under following categories:

full manual instrumentation (some people, mostly devs, do not like automatic instrumentation)
only automatic instrumentation
automatic with manual instrumentation (custom metrics, spans and additional instrumentation libraries)

blumamir · 2023-11-14T09:51:00Z

@MrAlias
go auto instrumentation is effectively an opentelemetry SDK. It starts and end spans, exports them, manages context, etc.
I think that while having another "go native" opentelemetry SDK living alongside the eBPF SDK in the same process is technically feasible (although hard), it has the following major cons:

I think that having 2 SDKs for the same application might be a source of confusion for end users. Each SDK has it's own configuration and mechanism. Having the same application generating signals from 2 different sources might be harder to debug, maintain, and understand. A new user would need to understand more
One of the key concepts of opentelemetry is the ability of any library to instrument it's source using the opentelemetry API. if you require users to install the native SDK to record telemetry from such libraries, it again means that code changes are required which is a big issue for many companies.
It requires more work from users, as they will need setup more code (the native go SDK), and maintain both over time.
As you mentioned, having 2 SDK means they will have to do some coordination and integration (mainly around the context and global providers), which is a source for technical challenges and bugs.
eBPF is not only about doing things more "automatically". It also introduces performance gains which might be appealing to many users and are lost if data is recorded with native SDK.

While both options have pros and cons, I personally think that choosing to process manual spans with eBPF probes is friendlier for users, easier to debug and develop, and is less mentally complex than the alternative.

edeNFed · 2023-11-14T10:33:14Z

I think we can take inspiration from the Java auto instrumentation.
The javaagent automatic instrumentation allows users to add additional spans via the @WithSpan annotation.
This is a similar approach to what @RonFed proposed. In both approaches, extensibility is provided via API (annotation or OTel Go API), and delivery, batching, etc is provided by the SDK of the agent (eBPF or javaagent).

MrAlias · 2023-11-14T15:12:17Z

I think that the Go Automatic Instrumentation should "by default" set the global providers and use them as well.

@pellared how do you set the global provider in another process?

MrAlias · 2023-11-14T15:13:20Z

I think that while having another "go native" opentelemetry SDK living alongside the eBPF SDK in the same process is technically feasible

@blumamir the eBPF instrumentation will never live within the same process as the running code.

blumamir · 2023-11-14T15:19:03Z

I think that while having another "go native" opentelemetry SDK living alongside the eBPF SDK in the same process is technically feasible

@blumamir the eBPF instrumentation will never live within the same process as the running code.

I am not sure about the precise definitions here, but what I meant to say is that a single process will generate telemetry signals (like spans) from 2 different SDKs.

pellared · 2023-11-14T15:19:07Z

@MrAlias @RonFed Now, I see your point. I think that may be worth exploring the support of instrumenting nonRecordingSpan.

MrAlias · 2023-11-14T20:13:54Z

Notes from SIG meeting:

We need to support trace context propagation regardless of the unified or bifurcated pipeline question
We want to support the situation where a user would like their telemetry to be sent to a different pipeline than the auto-instrumentation
- We want to support development environments that want to use different exporters than an platform running the auto-instrumentation.
- We want to support high-cardinality telemetry separation
- We want to support different samplers
- We want to support encapsulation of telemetry with different instrumentation scopes
- We want to support different ID generation
We want to support the situation where a user would like their telemetry to be sent to the same pipeline as the auto-instrumentation

Action items

Create issues for the net/http server, grpc server, and gin instrumentation to correctly propagate the span context in header values. This will enable any auto-instrumented spans to be connected to manually instrumented ones in a middleware or handler.
Provide a TracerProvider implementation from auto-instrumentation that manual instrumentation can explicitly use to say they want their data to go to the same pipeline. This means the auto-instrumentation will control the type that manual instrumentation uses to create, edit, and end spans if they want the data to be sent to the auto-instrumentation pipeline
Add a CLI flag that is an opt-in for all manual instrumentation to be sent to the auto-instrumentation pipeline. This is the idea that @RonFed is exploring above, just that it will not be the default.

MikeGoldsmith · 2023-11-27T13:20:44Z

Catching up on this thread after reading through the proposal in #523.

I'm concerned that setting up uprobes on the Go SDK's Start & End funcs effectively means that we're duplicating instrumentation (once in the SDK, then again in the instrumentation process).

Have we considered or attempted to construct trace & metrics providers and inject them into the process? Maybe we could detect a given provider has not been configured during init. This would happen once and then the auto instrumentation agent wouldn't need to interact with every span that's created in the target process with the Go SDK.

MrAlias · 2023-11-28T18:03:57Z

@pellared the proposed solution to (2) is adding support to capture from the default global implementation. It will not capture telemetry from any SDK registered with the global implementation, nor any default OTel SDK. Does this address (2) in your understanding?

MrAlias · 2023-11-28T18:05:53Z

Have we considered or attempted to construct trace & metrics providers and inject them into the process?

Wouldn't this violate the Go memory management requirements? We would be using types outside of the Go allocated memory space within a Go process.

pellared · 2024-01-30T17:54:05Z

SIG meeting:

Moving it post-beta. There is no reason to delay the beta release so much.

pellared added the enhancement New feature or request label Sep 27, 2023

pellared added this to Go Auto Instrumentation: Beta Sep 27, 2023

github-project-automation bot moved this to Todo in Go Auto Instrumentation: Beta Sep 27, 2023

pellared added the documentation Improvements or additions to documentation label Sep 27, 2023

pellared changed the title ~~Add custom instrumentation support~~ Add manual instrumentation support Sep 27, 2023

RonFed self-assigned this Nov 4, 2023

RonFed mentioned this issue Nov 22, 2023

Manual instrumentation support - Phase 1 #523

Merged

RonFed mentioned this issue Nov 30, 2023

Update roll dice example to include latest features #526

Open

pellared added this to Go Auto Instrumentation: GA Jan 30, 2024

pellared removed this from Go Auto Instrumentation: Beta Jan 30, 2024

pellared moved this to In progress in Go Auto Instrumentation: GA Jan 30, 2024

MrAlias mentioned this issue Jul 16, 2024

Custom SDK implementation #954

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual instrumentation support #352

Add manual instrumentation support #352

pellared commented Sep 27, 2023 •

edited by MikeGoldsmith

Loading

pellared commented Sep 27, 2023

MikeGoldsmith commented Oct 10, 2023

MrAlias commented Oct 10, 2023

pellared commented Oct 10, 2023 •

edited

Loading

RonFed commented Nov 12, 2023 •

edited

Loading

MrAlias commented Nov 13, 2023

pellared commented Nov 14, 2023

blumamir commented Nov 14, 2023

edeNFed commented Nov 14, 2023 •

edited

Loading

MrAlias commented Nov 14, 2023

MrAlias commented Nov 14, 2023

blumamir commented Nov 14, 2023

pellared commented Nov 14, 2023 •

edited

Loading

MrAlias commented Nov 14, 2023

MikeGoldsmith commented Nov 27, 2023 •

edited

Loading

MrAlias commented Nov 28, 2023

MrAlias commented Nov 28, 2023

pellared commented Jan 30, 2024

Add manual instrumentation support #352

Add manual instrumentation support #352

Comments

pellared commented Sep 27, 2023 • edited by MikeGoldsmith Loading

pellared commented Sep 27, 2023

MikeGoldsmith commented Oct 10, 2023

MrAlias commented Oct 10, 2023

pellared commented Oct 10, 2023 • edited Loading

RonFed commented Nov 12, 2023 • edited Loading

MrAlias commented Nov 13, 2023

pellared commented Nov 14, 2023

blumamir commented Nov 14, 2023

edeNFed commented Nov 14, 2023 • edited Loading

MrAlias commented Nov 14, 2023

MrAlias commented Nov 14, 2023

blumamir commented Nov 14, 2023

pellared commented Nov 14, 2023 • edited Loading

MrAlias commented Nov 14, 2023

Notes from SIG meeting:

Action items

MikeGoldsmith commented Nov 27, 2023 • edited Loading

MrAlias commented Nov 28, 2023

MrAlias commented Nov 28, 2023

pellared commented Jan 30, 2024

pellared commented Sep 27, 2023 •

edited by MikeGoldsmith

Loading

pellared commented Oct 10, 2023 •

edited

Loading

RonFed commented Nov 12, 2023 •

edited

Loading

edeNFed commented Nov 14, 2023 •

edited

Loading

pellared commented Nov 14, 2023 •

edited

Loading

MikeGoldsmith commented Nov 27, 2023 •

edited

Loading