-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add basic gateway tracing #8595
Conversation
66b4fe3
to
5d08759
Compare
5cab996
to
9b9691c
Compare
791c811
to
2a4398d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff!
settings, prefix, err := options.UnixfsAddOptions(opts...) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
span.SetAttributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocations can be avoided here (and elsewhere when attributes are used) by only setting them when the span is recording. e.g.
if span.IsRecording() {
// set stuffs
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly, this is tedious for some of the one-liners that start a span and set attributes together; I wonder if there is an alternative...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if it's worth adding an Attributes
method to some of these types like settings,
for example. It may be more appropriate for a follow-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process here was that we already know we have way bigger fish to fry in terms of memory allocations, so I wasn't too concerned with this. Hopefully one day we get to the point where these matter...in the meantime, we can enjoy the simpler code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and I've been using this branch to study gateway performance characteristics.
|
||
// Span starts a new span using the standard IPFS tracing conventions. | ||
func Span(ctx context.Context, componentName string, spanName string, opts ...traceapi.SpanStartOption) (context.Context, traceapi.Span) { | ||
return otel.Tracer("go-ipfs").Start(ctx, fmt.Sprintf("%s.%s", componentName, spanName), opts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the argument to otel.Tracer should be an empty string to use the default instrumentating library name as per the documentaion. Note the distinction made between Instrumented Library and Instrumenting Library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this tracer name ("instrumentation name") is similar to our logging "subsystems", in that the application can use these to know which component a span comes from, and use that to decide things like whether to record the span at all (see e.g. https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#suppressing-specific-agent-instrumentation and https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/span_processor_filter_example_test.go#L53 ). E.g. if an application uses this repo as a library, it can decide to drop all spans from go-ipfs
using this instrumentation name. If we use empty string here, the default go.opentelemetry.io/otel/sdk/tracer
will be used. See also https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#get-a-tracer .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is intended that the default tracer is used since we aren't changing the format of the traces. We already record the name of the system and the software used to record the trace in the Resource section. See, for example, this section of one trace. The otel documentation is not super clear but I feel that the presence of Version
and SchemaURL
fields in the InstrumentationLibrary
object imply that this is related to the type of trace that has been produced.
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "go-ipfs"
}
},
{
"Key": "service.version",
"Value": {
"Type": "STRING",
"Value": "0.13.0-dev"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.6.1"
}
}
],
"InstrumentationLibrary": {
"Name": "go-ipfs",
"Version": "",
"SchemaURL": ""
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the service is more at the "application" scope, so that we can tell from which system the spans of a distributed trace originate.
I'm also new to these concepts, so I've generally been mimicking the provided examples, which seem to use the same distinction: https://github.com/open-telemetry/opentelemetry-go/tree/main/example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I don't feel strongly about it. It's possible that the SchemaURL/Version fields are intended to describe the schema of the tags that the library is writing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure what those are for. This is all clearly communicated as experimental so that we have some wiggle room to adjust these over time as we learn more.
} | ||
|
||
// Span starts a new span using the standard IPFS tracing conventions. | ||
func Span(ctx context.Context, componentName string, spanName string, opts ...traceapi.SpanStartOption) (context.Context, traceapi.Span) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a candidate for moving to a new library package. As we add tracing to more base libraries we will repeat this multiple times and we can add more helpers to keep things consistent and to reduce the number of imports needed to start a span.
I started one here https://github.com/iand/go-ipfs-tracing as part of my experiments with this branch and instrumenting other related libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense, I was waiting to create something like this until I used OpenTelemetry in a few repos to understand our usage patterns better. I'm also hesitant to make a tracing "kitchen sink", since it seems like a place that would accumulate dependencies over time and significantly contribute to our dependency hell.
// Span names follow a convention of <Component>.<Span>, some examples: | ||
// | ||
// - component=Gateway + span=Request -> Gateway.Request | ||
// - component=CoreAPI.PinAPI + span=Verify.CheckPin -> CoreAPI.PinAPI.Verify.CheckPin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we tighten this convention up? I think we need to define what a component is and what convention we have for naming the span - is it the type name and method, or should it include the package somehow? We'll need this as soon as we start adding tracing to other ipfs libraries.
One convention could be that the component is the repo name go-ipfs/go-blockservice/go-namesys
and the Span name is the function name, prefixed by the typename if it's a method. Additional spans within a function would use a name derived from this base. The convention would lead to names like:
go-ipfs.PinAPI.Add
go-blockservice.Session.GetBlock
go-namesys.ResolveIPNS
We could include package name but it would make the names much longer for no great benefit, go-ipfs.PinAPI.Add
could be ``go-ipfs.core.coreapi.PinAPI.Add` (would also be extra maintenance burden to keep names in sync when refactoring)
This convention is not the prettiest but it would work consistently across all our libraries and gives signposts for developers looking for where a particular span was created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would need to be modified if we keep the library name in the Instrumentation Library field (although I don't know how/if that surfaces in the Jaeger UI for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to protect our ability to refactor repos, modules, and packages without breaking tracing consumers, so I'd like to avoid coupling it to those as much as possible. Much like we do with logging subsystems (I toyed with the idea of coupling these with logging subsystems but gave up.) The downside though is that it makes it harder to find the code origin of traces, but I still don't think that's very hard with a quick GitHub search. Or we could add some middleware that adds an attribute to all spans with the source file + line info of where the span originated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took it for a spin, works as expected 👍
Added more Gateway.*
spans in #8841 (needs to be merged into this PR) + small asks below.
Adds more visibility into how long generic lookup takes vs producing specific response type.
The only CI test that fails is go-ipfs-http-client, which is unrelated to changes from this PR and being fixed in #8838 Resolved conflicts, merging to avoid doing this again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, we can adjust / add more spans in future PRs.
- Added Jaeger UI instruction to
docs/environment-variables.md
, should be really useful for newcomers / gateway admins. - Drafted a blurb for go-ipfs 0.13 Release Notes in feat: add basic gateway tracing #8595 (comment)
@lidel's blurb for Release Notes (wip)
Added spans
Rateway.Request
http.target
has "original" requestResolvedPath
has immutable IPFS path behind the original request (matters when/ipns/
)Gateway.ServeRawBlock
Gateway.ServeCar
Gateway.ServeUnixFs
Gateway.ServeFile
Gateway.ServeDirectory
How to use Jaeger UI
One can use the
jaegertracing/all-in-one
Docker image to run a full Jaegerstack and configure go-ipfs to publish traces to it (here, in an ephemeral
container):
Then, in other terminal, start go-ipfs with Jaeger tracing enabled:
Finally, the Jaeger UI is available at http://localhost:16686
Demo