Skip to content
This repository has been archived by the owner on Jun 19, 2022. It is now read-only.

Accept both B3 and TraceContext Tracing #1335

Merged
merged 10 commits into from
Jul 15, 2020

Conversation

Harwayne
Copy link
Contributor

@Harwayne Harwayne commented Jun 24, 2020

Helps with #1147.

Proposed Changes

  • Use the tracecontextb3.TraceContextEgress in all of our client code for consistency with our server code (which should have been updated when upstream Accept both B3 and TraceContext style tracing knative/eventing#3388 was vendored in).
  • Only TraceContext style tracing is egressed.
    • This was done for backwards compatibility with the existing system, which sends via TraceContext style. We could output both formats, but it was determined that the additional overhead of ~100 bytes per message was too high for the potential benefit. See here for more discussion.

Release Note

NONE

…ontext style.

Egressing only TraceContext style is done for backwards compatibility with the existing system, which sends TraceContext style. We could output both styles, but it was determined that the additional overhead of ~100 bytes on every outgoing message was not worth it.
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Harwayne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@googlebot googlebot added the cla: yes (override cla status due to multiple authors bug) label Jun 24, 2020
@Harwayne
Copy link
Contributor Author

@nachocano should we hold this until after #1296 is in?

@ericlem
Copy link
Contributor

ericlem commented Jun 24, 2020

/lgtm
/hold
Holding for nacho's reply

@nachocano
Copy link
Member

nachocano commented Jun 24, 2020 via email

@nachocano
Copy link
Member

And this will have a bunch of conflicts.

I take this back. I was changing all e2e to start using cev2 and removing the good_client, but I rollbacked those changes as it can be done in a follow up, after mine and yours get in...
I'm currently experiencing the flakiness on the Brokercell UT. Hope we can get mine in early tomorrow

@nachocano
Copy link
Member

/hold cancel
the other PR got merged. Thanks for waiting @Harwayne !

@ericlem
Copy link
Contributor

ericlem commented Jun 24, 2020

/lgtm

@Harwayne
Copy link
Contributor Author

/hold

I clearly did something wrong:

TestGCPBrokerTracing: broker_tracing.go:45: multiple root spans: span_id:7881518364301324575 name:"Recv./test-g-c-p-broker-tracing-qwp7n/gcp-bfcyyklc" start_time:{seconds:1593035140 nanos:460429848} end_time:{seconds:1593035140 nanos:783118377} parent_span_id:22965739263654966 labels:{key:"/http/host" value:"default-brokercell-ingress.cloud-run-events.svc.cluster.local"} labels:{key:"/http/method" value:"POST"} labels:{key:"/http/path" value:"/test-g-c-p-broker-tracing-qwp7n/gcp-bfcyyklc"} labels:{key:"/http/user_agent" value:"Go-http-client/1.1"} labels:{key:"g.co/agent" value:"opencensus-go 0.23.0; stackdriver-exporter 0.10.0"} labels:{key:"http.url" value:"/test-g-c-p-broker-tracing-qwp7n/gcp-bfcyyklc"}, span_id:6804529643887419959 name:"Recv./test-g-c-p-broker-tracing-qwp7n/gcp-bfcyyklc" start_time:{seconds:1593035140 nanos:460488105} end_time:{seconds:1593035140 nanos:783081955} parent_span_id:22965739263654966 labels:{key:"/http/host" value:"default-brokercell-ingress.cloud-run-events.svc.cluster.local"} labels:{key:"/http/method" value:"POST"} labels:{key:"/http/path" value:"/test-g-c-p-broker-tracing-qwp7n/gcp-bfcyyklc"} labels:{key:"/http/user_agent" value:"Go-http-client/1.1"} labels:{key:"g.co/agent" value:"opencensus-go 0.23.0; stackdriver-exporter 0.10.0"} labels:{key:"http.url" value:"/test-g-c-p-broker-tracing-qwp7n/gcp-bfcyyklc"}

@ericlem
Copy link
Contributor

ericlem commented Jun 24, 2020

Is there a possibility the handlers were already wrapped (or internally registered traces) in the new wrapping calls you added?

@@ -68,7 +67,7 @@ func NewHTTPClient(ctx context.Context, maxConnsPerHost MaxConnsPerHost) *nethtt
MaxConnsPerHost: int(maxConnsPerHost),
IdleConnTimeout: 30 * time.Second,
},
Propagation: &tracecontext.HTTPFormat{},
Propagation: tracecontextb3.TraceContextEgress,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary as the client is already configured to use tracecontext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, this isn't really needed as this function returns an http.Client, which is egress only. However, I would like to change it to be consistent with all other places in the repo. The actual difference in runtime will be miniscule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the regular tracecontext format everywhere we don't need special behavior? That would maintain consistency and I feel that it is somewhat confusing to configure the client with a format that has special unneeded ingress behavior. I reviewed the tracecontextb3 PR before the formats were split but it should actually be unnecessary to have separate egress and ingress formats since the format is only every used with a handler (ingress) or a client transport (egress).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the regular tracecontext format everywhere we don't need special behavior? That would maintain consistency

Consistency with what? For the rest of this comment I will assume it is consistency within this repo or knative more generally. As opposed to consistency with the wider world of software using OpenCensus.

I disagree that it is simpler or more consistent to use &tracecontext.HTTPFormat{} for ingress and tracecontextb3.TraceContextEgress for egress, rather than always using tracecontextb3.TraceContextEgress for both. There is some runtime overhead, but it is going to be effectively imperceptible (one stack frame and an iteration of a size one slice per HTTP request, AFAICT).

If we always use the same format, then developers don't have to understand when to use each. They just always use tracecontextb3.TraceContextEgress, which gives the desired behavior in both circumstances.

@@ -42,7 +43,7 @@ type Config struct {
// Main should be called by the process' main function. It will run the knockdown test. The return
// value MUST be used in os.Exit().
func Main(config Config, kdr Receiver) int {
client, err := cloudevents.NewDefaultClient()
client, err := kncloudevents.NewDefaultClient()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we should change the test image as we expect to always be given tracecontext headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is consistency. There is little downside to accepting both, so why not do it? The only real reason we wouldn't is to verify that the incoming headers truly are TraceContext, rather than B3.

I also expect sooner or later someone will copy-paste this code and want the 'best practice' to be what they copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right my concern is detecting a regression where B3 is used instead of tracecontext. I also think that for normal event consumers accepting only tracecontext is good practice as that is what they should expect the broker to give them. On the other hand broker components should always use the clients package providers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched back with a comment explaining this is for tests only.

@@ -45,7 +45,7 @@ var (
MaxConnsPerHost: 500,
IdleConnTimeout: 30 * time.Second,
},
Propagation: &tracecontext.HTTPFormat{},
Propagation: tracecontextb3.TraceContextEgress,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also should be unnecessary as the client is already configured to use trace context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same argument as above? Namely that because this is an http.Client, it is inherently an egress only thing, and therefore doesn't need this new format?

pkg/pubsub/publisher/publisher.go Outdated Show resolved Hide resolved
@Harwayne
Copy link
Contributor Author

/unhold

@Harwayne
Copy link
Contributor Author

/retest

2 similar comments
@Harwayne
Copy link
Contributor Author

/retest

@Harwayne
Copy link
Contributor Author

/retest

@ian-mi
Copy link
Contributor

ian-mi commented Jul 13, 2020

/lgtm

@Harwayne
Copy link
Contributor Author

/retest

2 similar comments
@Harwayne
Copy link
Contributor Author

/retest

@Harwayne
Copy link
Contributor Author

/retest

@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-google-knative-gcp-integration-tests pull-google-knative-gcp-integration-tests
pull-google-knative-gcp-integration-tests
pull-google-knative-gcp-integration-tests
3/3
pull-google-knative-gcp-wi-tests pull-google-knative-gcp-wi-tests
pull-google-knative-gcp-wi-tests
pull-google-knative-gcp-wi-tests
pull-google-knative-gcp-wi-tests
pull-google-knative-gcp-wi-tests
2020-07-14 22:37:00.384 +0000 UTC
3/3

Job pull-google-knative-gcp-wi-tests expended all 3 retries without success.

@Harwayne
Copy link
Contributor Author

/retest

@knative-prow-robot knative-prow-robot merged commit d8db92a into google:master Jul 15, 2020
@Harwayne Harwayne deleted the tracecontext-egress branch July 15, 2020 18:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved cla: yes (override cla status due to multiple authors bug) lgtm size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants