Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Update the tracing docs to better explain options available to service developers #1711

Closed
objectiser opened this issue Sep 21, 2017 · 23 comments · Fixed by #1733
Closed

Comments

@objectiser
Copy link
Contributor

When using the Zipkin tracer, the documentation currently indicates that five (B3) header properties need to be manually propagated by the application to enable end to end tracing to work.

In actual fact, regardless of whether using Zipkin or LightStep, the only header property that needs to be propagated is x-ot-span-context.

This doesn't prevent the application from using a tracer compatible with the B3 trace context format. The B3 headers are still passed into the application - and if the application makes use of those headers to (for example) create additional child spans, the updated B3 headers can be injected into any outbound request and will be used by the Egress proxy (once #1702 is merged).

Therefore I would propose that the documentation is updated to make this clear.

However it has also been suggested that the use of x-ot-span-context header, when Zipkin is the configured tracer, can now be removed - as the original reason for its use no longer applies.

It would be good to have some end user opinions on which direction is preferred:

  1. Keep use of the single x-ot-span-context, and therefore application only has to manually propagate a single header for trace context

  2. Remove use of x-ot-span-context when zipkin tracer is configured, and the application needs to propagate all 5 of the B3 header properties.

@objectiser
Copy link
Contributor Author

cc @fabolive @adriancole

@mattklein123
Copy link
Member

cc @rshriram @bhs @goaway @RomanDzhabarov

In general, I would love to do a pass on the tracing docs and making them more clear across the features supported for every tracer, and then have specific docs for the currently supported ones (Zipkin and LS). Anything we can do here to simplify/clarify sounds great to me.

@codefromthecrypt
Copy link
Contributor

x-ot-span-context is not a standard (defacto or otherwise) header. I would suggest against implying we should proliferate use of this. For example, some use B3 because appliances already permit the headers (or other systems use it). That's certainly not the case with x-ot-span-context

@mattklein123
Copy link
Member

From my perspective I like an option that only requires a single header to be propagated. Personally I would just support both and clearly document what is possible. If people in general want to remove x-ot-span-context from Zipkin tracer that's also fine with me.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 22, 2017 via email

@mattklein123
Copy link
Member

i don't really care what the single header is called. The fact that no one has been able to pick one is all the more reason to just make it happen in Envoy and push it forward. So my vote here continues to be to support both options and clearly document. To me there is no downside of this approach.

@bhs
Copy link

bhs commented Sep 22, 2017

In the immediate term, I agree with @mattklein123's [pragmatic] reasoning. I also agree with @adriancole in that – in the long term – we should move beyond x-ot-span-context towards something that's both "standard" (sic?) and general-purpose. opentracing/specification#86 represents a few thoughts about this... my favorite personal outcome would be the header name (Trace-Context) from https://github.com/TraceContext/tracecontext-spec and a header value that's similar but slightly more general (variable length ids, ideally basic support for baggage).

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 23, 2017 via email

@mattklein123
Copy link
Member

I hope once you are at a non-frustrated moment, you could see downsides of amplifying this. If you did want a "pick one" seems someone at lyft could participate in trace-context simlarly to how jeff pinner did in http/2 back in the day.

@adriancole I'm sorry that you think that me pushing this forward is irresponsible and solely out of frustration. That's really not the case. I'm also more than happy to participate in working groups on this subject, with the caveat being that I would like to make meaningful progress which I have not seen happen so far (so yes you are right I am a little frustrated!).

In this particular case, I don't really see the downside of leaving working code in Envoy Zipkin that can do propagation with a single header. We all agree that we should make the B3 headers work perfectly. If someone prefers to propagate a single header they can do that also and we can document/warn in the Envoy docs that it is non-standard.

Fundamentally, I am a pragmatic person, and I want to make progress. You are correct that Envoy is a bit of a hammer in this regard, but sometimes that's what it takes to get things unclogged and moving. Per @bhs ultimately I would prefer that we standardize on some single header and Trace-Context seems fine to me. So if this issue and/or Envoy adds a bit of extra urgency to making progress on that front so be it. Let me know how I can help and let's make something happen that we all agree on.

@louiscryan
Copy link

I'm pretty much with Matt on this and sympathize with his frustration around progress on alignment. He's frustrated (and me too) not out of some desire for architectural purity but because Envoy is expected to carry a significant implementation and maintenance burden to normalize these issues for our users.

Envoy is a big hammer for projects like Istio and I want to make sure we use it wisely. Using Envoy as the means to promote standardization seems sensible to me. i.e Envoy transcribing between the commonly used trace formats as needed by the applications to which it is attached as a sidecar / reverse proxy for inbound traffic while rigorously using a single format for interchange between Envoy instances seems like a reasonable first step. If at a later point application frameworks pick up support for that 'one' format then the performance of the mesh improves.

This puts a pretty significant maintenance responsibility on Envoy to support a reasonable set of transcribable target formats and it also puts a configuration burden on operators while there is tracing implementation diversity within services in a mesh. I think it's also reasonable for Envoy developers to carry the burden for building/testing for some of the more widely used formats but not all of them.

In that vein, what do we think is reasonable to ask Matt and co. to support? Zipkin, OT, TraceContext, Jaeger, ... Thats already a pretty sizeable list and we haven't gotten to the commercial vendors

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 24, 2017 via email

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 24, 2017 via email

@bhs
Copy link

bhs commented Sep 24, 2017

It's odd that we're getting into the weeds of retry budgets/etc when we lack even a standard header name or a standard format that supports ids of different lengths (since different Tracers presently use ids of different lengths).

I would argue for something like w3c/trace-context#19, followed by support within Envoy for same, followed by work to standardize the way that the actual Span data (not just the context data, I mean) gets out of the Envoy process and into a tracing system.

Once that's done, we can pursue a version of the context work that incorporates the more exotic features (special support for retries, fancy sampling, etc). Working through the exotic stuff now is going to delay some straightforward improvements and generalizations in Envoy (and elsewhere).

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 24, 2017 via email

@mattklein123
Copy link
Member

I think that is an invalid expectation

I do not think it's an invalid expectation. To me it's pretty straightforward. I agree with @bhs that it's time to start making progress here and that "the perfect is the enemy of the good."

From my perspective as long as the header format is versioned in some way we can start with something and iterate later. (At minimum versioning could be some version bits. It could also be using something like proto to define the header contents which is inherently expandable in the future).

I'm committed to making progress on this subject as IMO it's one of the largest pain points of users of a transparent service mesh.

I would argue for something like w3c/trace-context#19, followed by support within Envoy for same, followed by work to standardize the way that the actual Span data (not just the context data, I mean) gets out of the Envoy process and into a tracing system.

+1. We will continue to work with @bhs and the OT folks to find a path forward here. We would of course love to work with anyone else and am happy to have in person conversations if necessary. (Quite honestly I have never been asked to attend a tracing meeting).

As an aside, I would suggest that we table this issue for now as we are getting off topic into more foundational conversations that are probably better held elsewhere.

I think the resolution of this specific issue is: B3 is now fully supported for Envoy Zipkin and we are committed to keeping that working. We won't remove the x-ot-span-context code as it is a working example of how to do single header propagation. We also won't push users towards it. We welcome any/all clarifying doc updates about the current state of the world. Fair?

@bhs
Copy link

bhs commented Sep 24, 2017

@mattklein123

Fair?

Sounds good to me. And I agree that the actual issue here is a lot simpler than the more foundational things it's bringing up.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 25, 2017 via email

@mattklein123
Copy link
Member

mattklein123 commented Sep 25, 2017

If one header to do all things, retries duration, deadlines, feature flags,
baggage etc

As @bhs said I think it's premature to start talking about this (though ultimately useful). Trace propagation is the biggest pain point that I see now and we can solve that first. As long as the format is extensible we can add more things later (maybe).

Can you find another person besides ben that thinks we can basically stop using all
headers except one from now on?

Lots of people would like to see this happen. It will not happen overnight. If we never start making it happen, it will never be so. As @louiscryan said, one of the benefits of something like Envoy is that we can transcode between different formats to help users migrate over time.

We would have no http/2 compression anymore

h2 static compression is dubiously useful for the data I am most concerned about, which will change for every request.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 25, 2017 via email

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 25, 2017 via email

@louiscryan
Copy link

@adriancole @bhs

given the amount of background reading to come up to date on all the issues perhaps the best next step for me is to provide review feedback on the TraceContext spec ?

If there were some cliff notes for the outstanding points of disagreement that would be helpful too. Pointers welcome

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 25, 2017 via email

@mattklein123
Copy link
Member

@adriancole can you send to me also. Thank you.

objectiser added a commit to objectiser/envoy that referenced this issue Sep 27, 2017
mattklein123 pushed a commit that referenced this issue Sep 27, 2017
…ion (#1733)

resolves #1711

Signed-off-by: Gary Brown <gary@brownuk.com>
costinm pushed a commit to costinm/envoy that referenced this issue Oct 2, 2017
jpsim pushed a commit that referenced this issue Nov 28, 2022
#1711)

Description: This reverts commit b04a8ab1adaa17a6cc8f574bca18b926a371f537.

The RBE ios_build job has been timing out every time in CI, and switching back to running on GHA runners resolves the issue.

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this issue Nov 29, 2022
#1711)

Description: This reverts commit b04a8ab1adaa17a6cc8f574bca18b926a371f537.

The RBE ios_build job has been timing out every time in CI, and switching back to running on GHA runners resolves the issue.

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants