Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tracing of gateway requests #143

Merged
merged 5 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ The following emojis are used to highlight certain changes:

### Added

- Tracing per request with auth header (see `RAINBOW_TRACING_AUTH`) or a fraction of requests (see `RAINBOW_SAMPLING_FRACTION`)

### Changed

- go-libp2p 0.35
Expand Down
32 changes: 32 additions & 0 deletions docs/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
- [Testing](#testing)
- [`GATEWAY_CONFORMANCE_TEST`](#gateway_conformance_test)
- [`IPFS_NS_MAP`](#ipfs_ns_map)
- [Tracing](#tracing)
- [`RAINBOW_TRACING_AUTH`](#rainbow_tracing_auth)

## Configuration

Expand Down Expand Up @@ -289,3 +291,33 @@ $ IPFS_NS_MAP="dnslink-test1.example.com:/ipfs/bafkreicysg23kiwv34eg2d7qweipxwos
$ curl -is http://127.0.0.1:8081/dnslink-test2.example.com/ | grep Etag
Etag: "bafkreicysg23kiwv34eg2d7qweipxwosdo2py4ldv42nbauguluen5v6am"
```

## Tracing

Tracing across the stack follows, as much as possible, the [Open Telemetry]
specifications. Configuration environment variables are specified in the
[OpenTelemetry Environment Variable Specification] where possible. The
[Boxo Tracing] documentation is the basis for tracing here.

A major distinction from the more general tracing enabled in boxo is that when
tracing is enabled it is restricted to flows through HTTP Gateway requests, rather
than also included background processes.

Note: requests are also traced when there is a `Traceparent` header passed that is valid
According to the [Trace Context] specification, even if the sampling fraction is set to 0.

### `RAINBOW_TRACING_AUTH`

The ability to pass `Traceparent` or `Tracestate` headers is guarded by an
`Authorization` header. The value of the `Authorization` header should match
the value in the `RAINBOW_TRACING_AUTH` environment variable.

### `RAINBOW_SAMPLING_FRACTION`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory could invent an OTEL_TRACES_SAMPLER and OTEL_TRACES_SAMPLER_ARG if that makes more sense, but this seemed fine for now.


The fraction (between 0 and 1) of requests that should be sampled.
This is calculated independently of any Traceparent based sampling.

[Boxo Tracing]: https://github.com/ipfs/boxo/blob/main/docs/tracing.md
[Open Telemetry]: https://opentelemetry.io/
[OpenTelemetry Environment Variable Specification]: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md
[Trace Context]: https://www.w3.org/TR/trace-context
9 changes: 9 additions & 0 deletions docs/headers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## `Authorization`

Optional request header that guards per-request tracing features.

See [`RAINBOW_TRACING_AUTH`](./environment-variables.md#rainbow_tracing_auth)

## `Traceparent`

See [`RAINBOW_TRACING_AUTH`](./environment-variables.md#rainbow_tracing_auth)
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ require (
github.com/libp2p/go-libp2p v0.35.1
github.com/libp2p/go-libp2p-kad-dht v0.25.2
github.com/libp2p/go-libp2p-record v0.2.0
github.com/libp2p/go-libp2p-routing-helpers v0.7.3
github.com/libp2p/go-libp2p-routing-helpers v0.7.4
github.com/libp2p/go-libp2p-testing v0.12.0
github.com/mitchellh/go-server-timing v1.0.1
github.com/mr-tron/base58 v1.2.0
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -391,8 +391,8 @@ github.com/libp2p/go-libp2p-kbucket v0.6.3/go.mod h1:RCseT7AH6eJWxxk2ol03xtP9pEH
github.com/libp2p/go-libp2p-peerstore v0.1.4/go.mod h1:+4BDbDiiKf4PzpANZDAT+knVdLxvqh7hXOujessqdzs=
github.com/libp2p/go-libp2p-record v0.2.0 h1:oiNUOCWno2BFuxt3my4i1frNrt7PerzB3queqa1NkQ0=
github.com/libp2p/go-libp2p-record v0.2.0/go.mod h1:I+3zMkvvg5m2OcSdoL0KPljyJyvNDFGKX7QdlpYUcwk=
github.com/libp2p/go-libp2p-routing-helpers v0.7.3 h1:u1LGzAMVRK9Nqq5aYDVOiq/HaB93U9WWczBzGyAC5ZY=
github.com/libp2p/go-libp2p-routing-helpers v0.7.3/go.mod h1:cN4mJAD/7zfPKXBcs9ze31JGYAZgzdABEm+q/hkswb8=
github.com/libp2p/go-libp2p-routing-helpers v0.7.4 h1:6LqS1Bzn5CfDJ4tzvP9uwh42IB7TJLNFJA6dEeGBv84=
github.com/libp2p/go-libp2p-routing-helpers v0.7.4/go.mod h1:we5WDj9tbolBXOuF1hGOkR+r7Uh1408tQbAKaT5n1LE=
github.com/libp2p/go-libp2p-testing v0.12.0 h1:EPvBb4kKMWO29qP4mZGyhVzUyR25dvfUIK5WDu6iPUA=
github.com/libp2p/go-libp2p-testing v0.12.0/go.mod h1:KcGDRXyN7sQCllucn1cOOS+Dmm7ujhfEyXQL5lvkcPg=
github.com/libp2p/go-libp2p-xor v0.1.0 h1:hhQwT4uGrBcuAkUGXADuPltalOdpf9aag9kaYNT2tLA=
Expand Down
16 changes: 15 additions & 1 deletion handlers.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
})
}

func setupGatewayHandler(cfg Config, nd *Node) (http.Handler, error) {
func setupGatewayHandler(cfg Config, nd *Node, tracingAuth string) (http.Handler, error) {
var (
backend gateway.IPFSBackend
err error
Expand Down Expand Up @@ -208,6 +208,20 @@
// Add tracing.
handler = otelhttp.NewHandler(handler, "Gateway")

// Remove tracing headers if not authorized
prevHandler := handler
handler = http.HandlerFunc(func(writer http.ResponseWriter, request *http.Request) {
if request.Header.Get("Authorization") != tracingAuth {
if request.Header.Get("Traceparent") != "" {
request.Header.Del("Traceparent")

Check warning on line 216 in handlers.go

View check run for this annotation

Codecov / codecov/patch

handlers.go#L215-L216

Added lines #L215 - L216 were not covered by tests
}
if request.Header.Get("Tracestate") != "" {
request.Header.Del("Tracestate")

Check warning on line 219 in handlers.go

View check run for this annotation

Codecov / codecov/patch

handlers.go#L218-L219

Added lines #L218 - L219 were not covered by tests
}
}
prevHandler.ServeHTTP(writer, request)
})

return handler, nil
}

Expand Down
17 changes: 15 additions & 2 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,18 @@
EnvVars: []string{"RAINBOW_LIBP2P_LISTEN_ADDRS"},
Usage: "Multiaddresses for libp2p bitswap client to listen on (comma-separated)",
},
&cli.StringFlag{
Name: "tracing-auth",
Value: "",
EnvVars: []string{"RAINBOW_TRACING_AUTH"},
Usage: "If set the key gates use of the Traceparent header by requiring the key to be passed in the Authorization header",
},
&cli.Float64Flag{
Name: "sampling-fraction",
Value: 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this default to 1 instead of 0? I'd been using 0 with some testing to prevent flooding as I was more interested in Traceheaders, but the default OTEL sampler is "parentbased_always_on".

Copy link
Contributor

@gammazero gammazero Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A value of zero seems like the expected default, so that no sampling is done.

EnvVars: []string{"RAINBOW_SAMPLING_FRACTION"},
Usage: "Rate at which to sample gateway requests. Does not include traceheaders which will always sample",
},
}

app.Commands = []*cli.Command{
Expand Down Expand Up @@ -459,7 +471,8 @@
gatewayListen := cctx.String("gateway-listen-address")
ctlListen := cctx.String("ctl-listen-address")

handler, err := setupGatewayHandler(cfg, gnd)
tracingAuth := cctx.String("tracing-auth")
handler, err := setupGatewayHandler(cfg, gnd, tracingAuth)

Check warning on line 475 in main.go

View check run for this annotation

Codecov / codecov/patch

main.go#L474-L475

Added lines #L474 - L475 were not covered by tests
if err != nil {
return err
}
Expand All @@ -480,7 +493,7 @@
registerVersionMetric(version)
registerIpfsNodeCollector(gnd)

tp, shutdown, err := newTracerProvider(cctx.Context)
tp, shutdown, err := newTracerProvider(cctx.Context, cctx.Float64("sampling-fraction"))

Check warning on line 496 in main.go

View check run for this annotation

Codecov / codecov/patch

main.go#L496

Added line #L496 was not covered by tests
if err != nil {
return err
}
Expand Down
2 changes: 1 addition & 1 deletion main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ func mustTestNodeWithKey(t *testing.T, cfg Config, sk ic.PrivKey) *Node {
func mustTestServer(t *testing.T, cfg Config) (*httptest.Server, *Node) {
nd := mustTestNode(t, cfg)

handler, err := setupGatewayHandler(cfg, nd)
handler, err := setupGatewayHandler(cfg, nd, "")
if err != nil {
require.NoError(t, err)
}
Expand Down
55 changes: 52 additions & 3 deletions tracing.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

import (
"context"

"github.com/ipfs/boxo/tracing"
"go.opentelemetry.io/otel/sdk/resource"
"go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
traceapi "go.opentelemetry.io/otel/trace"
tracenoop "go.opentelemetry.io/otel/trace/noop"
"strings"
)

func newTracerProvider(ctx context.Context) (traceapi.TracerProvider, func(context.Context) error, error) {
func newTracerProvider(ctx context.Context, traceFraction float64) (traceapi.TracerProvider, func(context.Context) error, error) {

Check warning on line 14 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L14

Added line #L14 was not covered by tests
exporters, err := tracing.NewSpanExporters(ctx)
if err != nil {
return nil, nil, err
Expand All @@ -37,8 +37,57 @@
if err != nil {
return nil, nil, err
}
options = append(options, trace.WithResource(r))

var baseSampler trace.Sampler
if traceFraction == 0 {
baseSampler = trace.NeverSample()
} else {
baseSampler = trace.TraceIDRatioBased(traceFraction)

Check warning on line 45 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L41-L45

Added lines #L41 - L45 were not covered by tests
}

// Sample all children whose parents are sampled
// Probabilistically sample if the span is a root which is a Gateway request
sampler := trace.ParentBased(
CascadingSamplerFunc(func(parameters trace.SamplingParameters) bool {
return !traceapi.SpanContextFromContext(parameters.ParentContext).IsValid()
}, "root sampler",
CascadingSamplerFunc(func(parameters trace.SamplingParameters) bool {
return strings.HasPrefix(parameters.Name, "Gateway")
}, "gateway request sampler",

Check warning on line 56 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L50-L56

Added lines #L50 - L56 were not covered by tests
baseSampler)))
options = append(options, trace.WithResource(r), trace.WithSampler(sampler))

Check warning on line 58 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L58

Added line #L58 was not covered by tests

tp := trace.NewTracerProvider(options...)
return tp, tp.Shutdown, nil
}

type funcSampler struct {
next trace.Sampler
fn func(trace.SamplingParameters) trace.SamplingResult
description string
}

func (f funcSampler) ShouldSample(parameters trace.SamplingParameters) trace.SamplingResult {
return f.fn(parameters)

Check warning on line 71 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L70-L71

Added lines #L70 - L71 were not covered by tests
}

func (f funcSampler) Description() string {
return f.description

Check warning on line 75 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L74-L75

Added lines #L74 - L75 were not covered by tests
}

// CascadingSamplerFunc will sample with the next tracer if the condition is met, otherwise the sample will be dropped
func CascadingSamplerFunc(shouldSample func(parameters trace.SamplingParameters) bool, description string, next trace.Sampler) trace.Sampler {
return funcSampler{
next: next,
fn: func(parameters trace.SamplingParameters) trace.SamplingResult {
if shouldSample(parameters) {
return next.ShouldSample(parameters)

Check warning on line 84 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L79-L84

Added lines #L79 - L84 were not covered by tests
}
return trace.SamplingResult{
Decision: trace.Drop,
Tracestate: traceapi.SpanContextFromContext(parameters.ParentContext).TraceState(),

Check warning on line 88 in tracing.go

View check run for this annotation

Codecov / codecov/patch

tracing.go#L86-L88

Added lines #L86 - L88 were not covered by tests
}
},
description: description,
}
}
Loading