Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manual instrumentation support #352

Open
pellared opened this issue Sep 27, 2023 · 18 comments
Open

Add manual instrumentation support #352

pellared opened this issue Sep 27, 2023 · 18 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@pellared
Copy link
Member

pellared commented Sep 27, 2023

Is your feature request related to a problem? Please describe.

One of the key aspects of observability is to have the ability to create instruments which are oriented around the application logic. Without this possibility the automatic instrumentation is more like "monitoring" tool.

Describe the solution you'd like

Add the application developer the possibility to augment the instrumentation in the application so that they can:

  • create custom spans (and tracers)
  • create custom metrics (and meters)
  • attach additional instrumentation libraries

For example, let's consider this application which is based on https://opentelemetry.io/docs/instrumentation/go/getting-started/:

package main

import (
	"io"
	"log"
	"math/rand"
	"net/http"
	"strconv"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/metric"
)

func main() {
	http.HandleFunc("/rolldice", rolldice)

	log.Fatal(http.ListenAndServe(":8080", nil))
}

var (
	tracer  = otel.Tracer("rolldice")
	meter   = otel.Meter("rolldice")
	rollCnt metric.Int64Counter
)

func init() {
	var err error
	rollCnt, err = meter.Int64Counter("dice.rolls",
		metric.WithDescription("The number of rolls by roll value"),
		metric.WithUnit("{roll}"))
	if err != nil {
		panic(err)
	}
}

func rolldice(w http.ResponseWriter, r *http.Request) {
	ctx, span := tracer.Start(r.Context(), "roll")
	defer span.End()

	roll := 1 + rand.Intn(6)

	rollValueAttr := attribute.Int("roll.value", roll)
	span.SetAttributes(rollValueAttr)
	rollCnt.Add(ctx, 1, metric.WithAttributes(rollValueAttr))

	resp := strconv.Itoa(roll) + "\n"
	if _, err := io.WriteString(w, resp); err != nil {
		log.Printf("Write failed: %v\n", err)
	}
}

Key requirements:

  1. The telemetry recorded in the application code (roll spans and dice.rolls metrics) is exported recorded and exported via the tracer provider and meter provider that is set up by Go Automatic Instrumentation.
  2. The telemetry recorded in the application code is related (as child) with the spans created by Go Automatic Instrumentation.

So the manual telemetry should be integrated with the telemetry produced by Go Automatic Instrumentation.

As part of this issue it would be also good to update https://github.com/open-telemetry/opentelemetry-go-instrumentation/tree/main/examples/rolldice/main.go with the code above.

Additional context

The need for this feature in .NET Automatic Instrumentation has resulted in complete redesign and code rewrite. Docs for reference: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/blob/main/docs/manual-instrumentation.md

Without this feature automatic instrumentation would be only a "poor mans" instrumentation as it would not be able to compete with the manual OTel Go SDK approach.

Here are some docs around manual instrumentation in this repository: https://github.com/open-telemetry/opentelemetry-go-instrumentation/blob/main/docs/design/manual-instrumentation.md

@pellared pellared added the enhancement New feature or request label Sep 27, 2023
@pellared
Copy link
Member Author

If the following is true:

Notice that currently, the automatic instrumentation correlates spans to the same trace if they are being executed by the same goroutine. In the future we plan to implement a more robust tracking of the goroutine tree to support traces from multiple goroutines. As part of this planned change, the current implementation of context propagation will also have to be changed (different key in the current span map).

Then I think this issue would be only about updating the documentation and tests.

@pellared pellared added the documentation Improvements or additions to documentation label Sep 27, 2023
@pellared pellared changed the title Add custom instrumentation support Add manual instrumentation support Sep 27, 2023
@MikeGoldsmith
Copy link
Member

Hey @pellared - thanks for creating the issue. Yes, we agree that the agent should support mixing automatic and manual instrumentation. As you found with .NET's implementation, without it the usefulness of this project is very limited as many users will want to graduate to a higher level of instrumentation that would require manually creating spans to capture additional business logic.

I see this as hard requirement for this project to move into beta because it could affect some core functionality within the current agent.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 10, 2023

What is the problem here? Just that the instrumentation currently sends telemetry to two different pipelines (TracerProviders)?

@pellared
Copy link
Member Author

pellared commented Oct 10, 2023

Just that the instrumentation currently sends telemetry to two different pipelines (TracerProviders)?

Yes. I think that the Go automatic instrumentation should set the global providers to decrease the amount of coding required to add additional custom telemetry.

For reference, this is what https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation does. See https://opentelemetry.io/docs/instrumentation/net/getting-started/

@RonFed RonFed self-assigned this Nov 4, 2023
@RonFed
Copy link
Contributor

RonFed commented Nov 12, 2023

I started implementing some of the required functionality for this issue and would like to hear your thoughts. I'm using the above roll-dice example for my testing.
The main idea is to add eBPF probes to start-span and end-span related methods in the go SDK.

Currently I added probes to the following functions:
go.opentelemetry.io/otel/sdk/trace.(*tracer).Start
go.opentelemetry.io/otel/sdk/trace.(*recordingSpan).End
This means that the current implementation will only work if the application sets up the SDK.

I looked at instrumenting nonRecordingSpan as well but because of the nature of its methods instrumenting them is not useful since:
func (nonRecordingSpan) End(...trace.SpanEndOption) {}
From the eBPF point of view, the span object is not reachable, and we don't have a way of knowing which span we are working on as opposed to func (s *recordingSpan) End(options ...trace.SpanEndOption) .
If the signature of the nonRecordingSpan methods were to change to pass the pointer, then we could construct the eBPF-equilent span for those as well.

Another difference is that SetAttributes does nothing for nonRecordingSpan, so it'll require instrumenting that method as well and saving the attributes, as opposed to the recordingSpan , in which we can extract the attributes from the span struct once End is called.

So I guess the main question is how important it is to support nonRecordingSpan? If I understood correctly, in order to support applications which don't set up the SDK rather just calling start span and end span, we need to support nonRecordingSpan.

@pellared @MrAlias

@MrAlias
Copy link
Contributor

MrAlias commented Nov 13, 2023

I don't think we should be overwriting the otel functionality. In fact, I don't think we should attempt to interact with the OTel setup in the target binary. It is a separate process that likely has a separate setup. By overriding some pipeline in that binary we are almost certainly going to cause confusion.

For example, if a user has instrumented their app directly with a TracerProvider that they intend to be different from the auto-instrumentation, if we "unify" the pipelines for them it would be unexpected behavior.

This would also be an overreach. Observability backends are designed to solve this problem. If we can have the net/http auto-instrumentation correctly inject the trace information into the context.Context returned from Request.Context() then the user should able to correctly create a child span in the trace. From there any observability system should have no problem tying the spans into the same trace. That is the purpose of the backend.

From some ad-hoc testing, it appears we do not currently inject the span context into the context returned by Request.Context. I think to resolve this we need to start there.

Similarly for the grpc instrumentation, a way to get the auto-instrumented span context will need to be made available to the user.

@pellared
Copy link
Member Author

I think that the Go Automatic Instrumentation should "by default" set the global providers and use them as well.

On top of it, we should add a possibility to:

  1. Disable bootstrapping of the providers by Go Automatic Instrumentation via option and env var - so that the users can set the global providers on their own. Go Automatic Instrumentation would be then used mainly to get the instrumentation libraries thanks to eBPF instead of manually using e.g. otelhttp and otelgrpc.
  2. Provide With*Provider options so that Go Automatic Instrumentation uses the provided signal providers instead of the globals. I am not sure if users would need it, but it could be useful for our testing.

From my experience based on https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation 99% of use cases under following categories:

  1. full manual instrumentation (some people, mostly devs, do not like automatic instrumentation)
  2. only automatic instrumentation
  3. automatic with manual instrumentation (custom metrics, spans and additional instrumentation libraries)

@blumamir
Copy link
Member

@MrAlias
go auto instrumentation is effectively an opentelemetry SDK. It starts and end spans, exports them, manages context, etc.
I think that while having another "go native" opentelemetry SDK living alongside the eBPF SDK in the same process is technically feasible (although hard), it has the following major cons:

  1. I think that having 2 SDKs for the same application might be a source of confusion for end users. Each SDK has it's own configuration and mechanism. Having the same application generating signals from 2 different sources might be harder to debug, maintain, and understand. A new user would need to understand more
  2. One of the key concepts of opentelemetry is the ability of any library to instrument it's source using the opentelemetry API. if you require users to install the native SDK to record telemetry from such libraries, it again means that code changes are required which is a big issue for many companies.
  3. It requires more work from users, as they will need setup more code (the native go SDK), and maintain both over time.
  4. As you mentioned, having 2 SDK means they will have to do some coordination and integration (mainly around the context and global providers), which is a source for technical challenges and bugs.
  5. eBPF is not only about doing things more "automatically". It also introduces performance gains which might be appealing to many users and are lost if data is recorded with native SDK.

While both options have pros and cons, I personally think that choosing to process manual spans with eBPF probes is friendlier for users, easier to debug and develop, and is less mentally complex than the alternative.

@edeNFed
Copy link
Contributor

edeNFed commented Nov 14, 2023

I think we can take inspiration from the Java auto instrumentation.
The javaagent automatic instrumentation allows users to add additional spans via the @WithSpan annotation.
This is a similar approach to what @RonFed proposed. In both approaches, extensibility is provided via API (annotation or OTel Go API), and delivery, batching, etc is provided by the SDK of the agent (eBPF or javaagent).

@MrAlias
Copy link
Contributor

MrAlias commented Nov 14, 2023

I think that the Go Automatic Instrumentation should "by default" set the global providers and use them as well.

@pellared how do you set the global provider in another process?

@MrAlias
Copy link
Contributor

MrAlias commented Nov 14, 2023

I think that while having another "go native" opentelemetry SDK living alongside the eBPF SDK in the same process is technically feasible

@blumamir the eBPF instrumentation will never live within the same process as the running code.

@blumamir
Copy link
Member

I think that while having another "go native" opentelemetry SDK living alongside the eBPF SDK in the same process is technically feasible

@blumamir the eBPF instrumentation will never live within the same process as the running code.

I am not sure about the precise definitions here, but what I meant to say is that a single process will generate telemetry signals (like spans) from 2 different SDKs.

@pellared
Copy link
Member Author

pellared commented Nov 14, 2023

@MrAlias @RonFed Now, I see your point. I think that may be worth exploring the support of instrumenting nonRecordingSpan.

@MrAlias
Copy link
Contributor

MrAlias commented Nov 14, 2023

Notes from SIG meeting:

  • We need to support trace context propagation regardless of the unified or bifurcated pipeline question
  • We want to support the situation where a user would like their telemetry to be sent to a different pipeline than the auto-instrumentation
    • We want to support development environments that want to use different exporters than an platform running the auto-instrumentation.
    • We want to support high-cardinality telemetry separation
    • We want to support different samplers
    • We want to support encapsulation of telemetry with different instrumentation scopes
    • We want to support different ID generation
  • We want to support the situation where a user would like their telemetry to be sent to the same pipeline as the auto-instrumentation

Action items

  1. Create issues for the net/http server, grpc server, and gin instrumentation to correctly propagate the span context in header values. This will enable any auto-instrumented spans to be connected to manually instrumented ones in a middleware or handler.
  2. Provide a TracerProvider implementation from auto-instrumentation that manual instrumentation can explicitly use to say they want their data to go to the same pipeline. This means the auto-instrumentation will control the type that manual instrumentation uses to create, edit, and end spans if they want the data to be sent to the auto-instrumentation pipeline
  3. Add a CLI flag that is an opt-in for all manual instrumentation to be sent to the auto-instrumentation pipeline. This is the idea that @RonFed is exploring above, just that it will not be the default.

@MikeGoldsmith
Copy link
Member

MikeGoldsmith commented Nov 27, 2023

Catching up on this thread after reading through the proposal in #523.

I'm concerned that setting up uprobes on the Go SDK's Start & End funcs effectively means that we're duplicating instrumentation (once in the SDK, then again in the instrumentation process).

Have we considered or attempted to construct trace & metrics providers and inject them into the process? Maybe we could detect a given provider has not been configured during init. This would happen once and then the auto instrumentation agent wouldn't need to interact with every span that's created in the target process with the Go SDK.

@MrAlias
Copy link
Contributor

MrAlias commented Nov 28, 2023

@pellared the proposed solution to (2) is adding support to capture from the default global implementation. It will not capture telemetry from any SDK registered with the global implementation, nor any default OTel SDK. Does this address (2) in your understanding?

@MrAlias
Copy link
Contributor

MrAlias commented Nov 28, 2023

Have we considered or attempted to construct trace & metrics providers and inject them into the process?

Wouldn't this violate the Go memory management requirements? We would be using types outside of the Go allocated memory space within a Go process.

@pellared
Copy link
Member Author

SIG meeting:

Moving it post-beta. There is no reason to delay the beta release so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
Status: In progress
Development

No branches or pull requests

6 participants