Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: update to otelhttp roundtripper #2238

Merged
merged 1 commit into from
Jul 16, 2021

Conversation

@coryb
Copy link
Collaborator

coryb commented Jul 11, 2021

fwiw, I am not a big fan of the way otelhttptrace adds a bunch of sub-spans to the trace. For me it is adding mostly noise to already complex traces, I much prefer annotations/events for this data so I can look into the span details when it is interesting. If interested, this is the httptrace logic I use with otel.

Edit: saw the issue you opened for otelhttptrace, I will look into an upstream patch for an option for this functionality.

@tonistiigi tonistiigi force-pushed the http-tracing branch 2 times, most recently from 469b05b to cf19610 Compare July 12, 2021 04:11
@tonistiigi
Copy link
Member Author

tonistiigi commented Jul 12, 2021

@coryb I added your PR open-telemetry/opentelemetry-go-contrib#879

One issue is that I can see my authorization token in the trace. Definitely need to find a fix for that.

Another smaller problem is that in Jaeger it doesn't show the timestamps of the events relative to span, but relative to trace. That means that the precision of the timestamps is really low. Obviously not a problem in your code but in Jaeger, but if you use Jaeger yourself would be nice to find some solution to that as well.
image

@coryb
Copy link
Collaborator

coryb commented Jul 12, 2021

Interesting, the headers annotations are actually in the original code, so your tokens were leaking just via one of the sub spans. I will update the PR to remove all the header annotations, don't think they add much value, not sure I want to just special case a few known auth headers.
Also dont have any ideas for the jaeger ui, internally we use the zipkin-lens ui which seems to work okay for these spans. It also shows offset from the root span, but it displays microseconds.

@tonistiigi
Copy link
Member Author

to remove all the header annotations,

I'd be ok with just authorization headers (or maybe also if header contains "password/secret"). The most robust would be probably to add a filter option. Still, auth/secrets should be hidden by default I think.

@tonistiigi
Copy link
Member Author

the headers annotations are actually in the original code,

From the code that looks correct but for some reason I can't see any actual header values in Jaeger in previous version. Only the http.headers span.

@coryb
Copy link
Collaborator

coryb commented Jul 12, 2021

I have updated the otel PR to redact several headers by default, you can add more with WithRedactedHeaders option.

@tonistiigi tonistiigi marked this pull request as ready for review July 12, 2021 07:27
@tonistiigi tonistiigi force-pushed the http-tracing branch 2 times, most recently from 9a37617 to fe02b0e Compare July 14, 2021 04:58
@tonistiigi
Copy link
Member Author

Updated again. I'm ok with merging this with the replace rules. The worst that can happen if there is trouble in upstream is that we need to make a copy of these small utility packages to have it the way we like it. Looking that vendor brings in some cruft we would also save 80-90% size with that.

@coryb
Copy link
Collaborator

coryb commented Jul 14, 2021

We should probably also add a replace for the grpc interceptor that leaks goroutines quickly. These are the upstream PRs that other users have fixed:
open-telemetry/opentelemetry-go-contrib#836
open-telemetry/opentelemetry-go-contrib#840

I am running buildkit with a branch I made with which merges both PRs:

go mod edit -replace=go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc=github.com/coryb/opentelemetry-go-contrib/instrumentation/google.golang.org/grpc/otelgrpc@33f672f422babda89f6abf5c9bc3e8aeb50d5df2

I also vote for releasing with the replace for now, we can fix when otel merges the issues.

@tonistiigi
Copy link
Member Author

Hmm. I thought I fixes #836 on buildkit side (force calling CloseSend) and for #840 I'm interested in the cases where this happens as I think the client is not correct there. Ok with including them atm though but we should track the issues.

@tonistiigi
Copy link
Member Author

tonistiigi commented Jul 14, 2021

@coryb Can you update this PR by adding these 2 PRs. They don't apply cleanly.

nvm. you already have a branch

@tonistiigi
Copy link
Member Author

@coryb Updated. Please verify the branch.

@coryb
Copy link
Collaborator

coryb commented Jul 14, 2021

Hmm. I thought I fixes #836 on buildkit side (force calling CloseSend) and for #840 I'm interested in the cases where this happens as I think the client is not correct there. Ok with including them atm though but we should track the issues.

I am not sure if 836 is essential, it was the first issue I came across when diagnosing the leak, it seemed like a good fix anyway. 840 was essential to solve the goroutine leak though. I agree it is likely client issues, but they will be tricky to diagnose.

Please verify the branch.

LGTM

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants