Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to Adopt Tokio Tracing as the OTel Tracing API #1689

Open
cijothomas opened this issue Apr 30, 2024 · 35 comments
Open

Proposal to Adopt Tokio Tracing as the OTel Tracing API #1689

cijothomas opened this issue Apr 30, 2024 · 35 comments

Comments

@cijothomas
Copy link
Member

cijothomas commented Apr 30, 2024

Introduction

This issue builds upon Option 2 from the discussion OTel Tracing vs Tokio-Tracing, proposing a strategic shift in our approach to traces in OpenTelemetry Rust.

Summary

We propose deprecating the OpenTelemetry (OTel) Tracing API in favor of adopting Tokio Tracing as the official Tracing API, requiring re-instrumentation for those apps already using OTel Tracing API. The functionalities already provided by the OpenTelemetry Tracing SDK, such as Sampling, SpanProcessors, and Exporters, would remain largely unchanged.

Details

  • Default Behavior: tracing macros act as no-ops when no subscriber is configured, consistent with the current OTel API.
  • Instrumentation: Users will transition from the OTel API to tracing macros for span generation, requiring re-instrumentation.
  • Context Propagation: Users will adopt tracing's methods for in-process context propagation (#instrument attribute or using span.in_scope(closure)), necessitating re-instrumentation. For out-of-process context propagation, OpenTelemetry's abstractions will continue to be used, leveraging the strengths of both tracing for in-process and OpenTelemetry for out-of-process propagation.
  • SDK Enablement: Installation of the opentelemetry-sdk crate and configuration of the TracerProvider will remain mostly unchanged.
  • Telemetry Backends: No changes to exported telemetry are expected, nor are changes required in telemetry backends or vendors.
  • Changes to Existing Crates: Once the OpenTelemetry SDK officially recognizes tracing as the API, there will no longer be a need for tracing-opentelemetry, which currently bridges tracing to OTel. Additionally, the opentelemetry-appender-tracing could be integrated into the OTel SDK itself, allowing users to decide if events from tracing should be converted to SpanEvents or LogRecords resolving issues like this.

Advantages

  • Simplification: Reduces the learning curve by consolidating on a single, familiar, and widely adopted tracing mechanism.
  • Community Focus: Concentrates community efforts on one primary tracing ecosystem. Without consolidation, libraries and applications in the Rust ecosystem will be forced to choose one over the other. This often leads to abstractions on top of these choices, effectively leading to further fragmentation, as humorously illustrated in this xkcd comic.
  • Eliminate interop issues: With only a single choice for instrumentation, an entire class of issues arising from lack of interoperability are eliminated, even resolving issues related to Log Correlation as well.
  • Unblocks OTel Rust GA: Resolving this fundamental issue is a prerequisite for progressing OTel Rust towards a GA release.

Challenges

  • Adaptation by tracing: Necessary modifications in tracing to support OpenTelemetry scenarios are crucial. An initial set of required changes is listed below.
  • Community and Governance Acceptance: Endorsement from the OpenTelemetry Technical and Governance Committees is required for OTel Rust to abandon its Tracing API.
  • Migration Help: Since this involves re-instrumentation efforts, we will need to provide the necessary support and guidance (docs etc.) to minimize disruption.
  • OTel SDK Refactoring: Significant portions of the OpenTelemetry tracing SDK code need refactoring.
  • Deviation from other languages: Since tracing was never written with OTel specs in mind, there would always be some differences compared to other languages OTel based Tracing API. For example, Span macros in tracing has the notion of Level, which is not yet present in OTel.

Required Changes in tracing

  1. Introduction of Span Kind, an enum with 5 variants Server,Client,Producer,Consumer,Internal. The tracing-opentelemetry bridge currently handles this through specially named attributes, serving as a short-term workaround.
  2. Span links - the follow_from in tracing might be the equivalent, but this requires more prototyping.
  3. The notion of "target" in tracing is equivalent to instrumentation scope, and could serve as a reasonable alternative, though it lacks support for version, schema URL, and additional descriptive key-value pairs.
  4. tracing macros require attribute keys to be declared at compile time. This is probably manageable as shown by implementations such as reqwest-tracing and axum-tracing-opentelemetry.
  5. tracing is not a 1.0 crate yet. This is not necessarily an issue, but OTel Rust and Tokio Tracing should require some co-ordination when planning future releases.

Prototype

A minimal prototype is available here: https://github.com/cijothomas/opentelemetry-tracing/tree/main/src

Please share your thoughts, concerns, and issues we may have overlooked. We recognize that re-instrumentation is a considerable effort and not something anyone looks forward to. However, this small investment now is expected to yield significant long-term benefits, setting OpenTelemetry Rust up for greater success in the future.

@mladedav
Copy link

Required Changes in tracing
Introduction of Span Kind

Using span kinds is somewhat niche use case that pretty much only middle wares (be it server or client) really care about.

What I've been thinking about lately was to create tracing-like macros to use these magic strings safely. E.g. something like

otel::info_span!(kind: server, otel_parent: &extracted_context, target: "target", "span_name", ?tracing.field, %another.field);

which would get translated to

tracing::info_span!(target: "target", "span_name", ?tracing.field, %another.field, otel.kind = "server", otel.parent.traceId = &extracted_context.trace_id, otel.parent.spanId = &extracted_context.span_id);

This would guarantee that people don't mistype the names and the macro can also check for correct types.

This can also be used to provide parent context at the time of span creation so that a spans always either have the same from their creation or they're root spans, but it will not change as it can now. I think that caused some issues with pairing contexts to logs in the appender, right?

@cijothomas
Copy link
Member Author

Using span kinds is somewhat niche use case that pretty much only middle wares (be it server or client) really care about.

Not sure if that makes it niche! Queueing/Messaging scenarios also need to convey the SpanKind.
Of course it is possible to workaround by storing it as specially named attributes, but I'd hope that tracing can make it a first class concept.
SpanKind is also an input to the OTel Sampler. Sampling is executed in hot-path, so Samplers that use Span.Kind to make decision would need it to be a first class thing, as opposed to examining the attributes to find the specially named one.

A lot of users would be just fine with "internal (the default)" Span Kind. So the existing macros can continue to work as-is, but new ones on top can be written for use by those users who need a diff. Span Kind than the default. Very similar to how target is treated - it is a first class thing, but optional in the macros. (Anyway this is up-to the tracing team to decide!)

@cijothomas
Copy link
Member Author

cijothomas commented Apr 30, 2024

Tagging @open-telemetry/rust-approvers
@jtescher as otel rust maintainer and tracing-opentelemetry maintainer (from which the prototype shared above is inspired from!)
@davidbarsky as tokio-tracing maintainer

Please share your comments/thoughts.

@austinlparker
Copy link
Member

I've added this to an upcoming GC meeting agenda.

@tedsuo
Copy link

tedsuo commented Apr 30, 2024

How is this expected to work regarding feature development? Is the Tokio community willing to join the OTel spec going forwards?

@austinlparker
Copy link
Member

I would also point out that OpenTelemetry isn't just a tracing API; How is this going to work with metrics, logs, profiles, events, and any other future API surface area the spec defines?

@cijothomas
Copy link
Member Author

I would also point out that OpenTelemetry isn't just a tracing API; How is this going to work with metrics, logs, profiles, events, and any other future API surface area the spec defines?

@austinlparker This is strictly limited to the tracing API (tracer/span one).

@cijothomas
Copy link
Member Author

How is this expected to work regarding feature development? Is the Tokio community willing to join the OTel spec going forwards?

As I have mentioned in the issue itself, none of this would work without tokio-tracing willing to make the changes required. (not just now, but in the future). The prototype is to explore if we really need changes in tokio-tracing, and if yes, what are the exact changes, and then tokio-tracing can make a decision.

Tokio community willing to join the OTel spec going forwards?

I am not so sure if I understand this part of the question? Could you clarify? Did you mean if tokio-tracing maintainers should join OTel spec as approver and/or attend spec meetings? From what I can tell, that is not required. If there is a new requirement in OTel Tracing API, the OTel Rust community can propose the same for tokio-tracing.
(This is similar to what OpenTelemetry .NET SIG does. eg: Recent addition to .NET for support AddLink after Span creation.)

@cijothomas
Copy link
Member Author

I've added this to an upcoming GC meeting agenda.

Thanks. I've briefly mentioned about this informally to Trask, Jack (in a different context). Did not want to bother TC/GC officially, until OTel Rust community and Tokio-tracing maintainers have expressed willingness to move in this direction, but happy to get an early bless/block on the idea.

@austinlparker
Copy link
Member

I've added this to an upcoming GC meeting agenda.

Thanks. I've briefly mentioned about this informally to Trask, Jack (in a different context). Did not want to bother TC/GC officially, until OTel Rust community and Tokio-tracing maintainers have expressed willingness to move in this direction, but happy to get an early bless/block on the idea.

Speaking personally, I would much rather the GC gets something like this sooner rather than later so that we can provide feedback and understand it better. :)

@cijothomas
Copy link
Member Author

I've added this to an upcoming GC meeting agenda.

Thanks. I've briefly mentioned about this informally to Trask, Jack (in a different context). Did not want to bother TC/GC officially, until OTel Rust community and Tokio-tracing maintainers have expressed willingness to move in this direction, but happy to get an early bless/block on the idea.

Speaking personally, I would much rather the GC gets something like this sooner rather than later so that we can provide feedback and understand it better. :)

Thanks. Noted!

By the way, there is no harm if GC/TC or Tracing or OTel Rust rejects this proposal! It'd be a very valuable learning for us. Also this is just option2 from the original issue: #1571 and I volunteered to write down option2 in detail, while other maintainers volunteered to write down option3 in detail. None has signed up for option1 and option4 yet - most likely I'll take a stab at them.

@lalitb
Copy link
Member

lalitb commented May 1, 2024

None has signed up for option1 and option4 yet - most likely I'll take a stab at them.

I can take a stab at Option 4, if no one has yet started on this.

@MikeGoldsmith
Copy link
Member

MikeGoldsmith commented May 1, 2024

If there is a new requirement in OTel Tracing API, the OTel Rust community can propose the same for tokio-tracing.
(This is similar to what OpenTelemetry .NET SIG does. eg: Recent addition to .NET for dotnet/runtime#97680 after Span creation.)

Basing an OTel SDK implementation on another tool (Tokio for Rust, Activity for .NET) doesn't feel right to me. The .NET SDK has had to make concessions to support being based on .NETs Activity framework, including deviating from the OTel spec. When combing tools like this, we nearly always see conflicting concepts and wording because they share a similar problem space. This results in implementation nuance where the same term means something different in different places.

I believe the OTel SDKs should be independent & spec compliant tools that can leverage languages features and popular frameworks through instrumentation.

This may be a naive question but what is stopping the Rust SDK from providing a rich Tokio instrumentation experience without adopting it's tracing API?

@diurnalist
Copy link

diurnalist commented May 1, 2024

This may be a naive question but what is stopping the Rust SDK from providing a rich Tokio instrumentation experience without adopting it's tracing API?

@MikeGoldsmith I had the same question. In the other discussion thread I mentioned that perhaps the bridge packages in the opentelemetry-go SDK might be a good pattern to follow if feasible.

I believe the OTel SDKs should be independent & spec compliant tools that can leverage languages features and popular frameworks through instrumentation.

💯

@cijothomas
Copy link
Member Author

@MikeGoldsmith, thank you for your insights! I've referenced the situation with OTel .NET in the parent issue as well, particularly highlighting the challenges with spec compliance. If OTel Rust were to follow a similar path, we might encounter comparable challenges. However, the prototypes suggest that the impact might be less severe, as Tokio Tracing already aligns closely with the OTel Tracing API spec—right down to using the same terminology, like "Span". (avoiding .NET's biggest confusion by calling "Activity" to mean "Span"!)

@diurnalist, thanks a lot for taking time to share your thoughts!

Indeed, the bridge between Tokio Tracing and OpenTelemetry already exists, which is a testament to the integration efforts between these two ecosystems. Based on my discussions with the OTel Rust community during SIG calls and on Slack, I got the impression that nearly everyone is already leveraging Tokio Tracing alongside this bridge. This significant community adoption is what initially triggered the discussion about potentially rethinking the role of the OTel Tracing API in favor of a more integrated approach with Tokio Tracing. We've discussed at least four different options, and this issue focuses on one where the OTel sacrifices its tracing API to officially recognize and recommend Tokio Tracing. (We are yet to write details about other options.)

One of the key principles we have been striving to achieve was to ensure a single instrumentation API for end-users.
Having two competing APIs can undermine the long-term success of both, often leading to unnecessary layers of abstraction and fragmentation. OpenTelemetry itself avoided that by merging OpenTracing and OpenCensus. I hope we can find a solution that ensures the long-term success of the entire Rust ecosystem, benefiting not just OpenTelemetry or Tokio-Tracing individually but strengthening the community as a whole.

@diurnalist
Copy link

@cijothomas very cool about the bridge! Perhaps it would make sense to eventually pull the bridge under the OTel GitHub organization, but that is up to the community and the maintainers.

One of the key principles we have been striving to achieve was to ensure a single instrumentation API for end-users.

If I may briefly make the argument :) - there are a few classes of end-users. One class is who we might typically think of, the developers who are building something and want to utilize traces, and for those users I don't think they care too much about which client they use to emit traces as long as they are shipped off as OTLP. These are SDK users--they might not really care as much about the APIs they're using, but have the responsibility of providing some concrete implementation that outputs OTLP.

The other class of end-users consists of library authors who want to implement instrumentation hooks. Library authors who wish to instrument their library with logs, metrics, and/or traces will have to implement hooks in tokio (for traces) and some mixture of other libraries and/or the OTel core API (for metrics and logs.) From a library author's perspective, I would guess they would prefer to do that w/ one single API, hence an OTel API crate will likely always exist in practice, hence it may make sense to consider that one the standard one for tracing in Rust. Library authors are API users as opposed to SDK users--they don't care about the API's concrete implementation, they just want to integrate against it.

Tokio can continue to maintain its own trace interface and developers may opt to use that SDK in conjunction w/ the bridge. I can imagine that over time it may appear that it's simpler to not use Tokio+bridge and developers will prefer to use a single SDK for everything, and that will be the OTel SDK. And, library maintainers may prefer to switch to integrating against the OTel API crate rather than requiring their end-users set up this bridge. I have been seeing this happen in the Go ecosystem, oddly enough w/ Google's own client libraries, which for a long time still outputted data via OpenCensus, but have now finally moved to OTel, removing the need for end-users to (a) know about and (b) configure the opencensus bridge if they are using a Google client in their Go project.

At least, that's what I'm reading in the tea leaves :)

@cijothomas
Copy link
Member Author

it would make sense to eventually pull the bridge under the OTel GitHub organization, but that is up to the community and the maintainers.

Well, this proposal is doing exactly that. The prototype is a "mini" version of that bridge. The additional thing proposed is to deprecate the OTel Tracing API, and officially bless Tokio-tracing. (even without blessing, that is where Rust community is already.)

@cijothomas
Copy link
Member Author

One of the key principles we have been striving to achieve was to ensure a single instrumentation API for end-users.

If I may briefly make the argument :) - there are a few classes of end-users. One class is who we might typically think of, the developers who are building something and want to utilize traces, and for those users I don't think they care too much about which client they use to emit traces as long as they are shipped off as OTLP. These are SDK users--they might not really care as much about the APIs they're using, but have the responsibility of providing some concrete implementation that outputs OTLP.

The other class of end-users consists of library authors who want to implement instrumentation hooks. Library authors who wish to instrument their library with logs, metrics, and/or traces will have to implement hooks in tokio (for traces) and some mixture of other libraries and/or the OTel core API (for metrics and logs.) From a library author's perspective, I would guess they would prefer to do that w/ one single API, hence an OTel API crate will likely always exist in practice, hence it may make sense to consider that one the standard one for tracing in Rust. Library authors are API users as opposed to SDK users--they don't care about the API's concrete implementation, they just want to integrate against it.

Tokio can continue to maintain its own trace interface and developers may opt to use that SDK in conjunction w/ the bridge. I can imagine that over time it may appear that it's simpler to not use Tokio+bridge and developers will prefer to use a single SDK for everything, and that will be the OTel SDK. And, library maintainers may prefer to switch to integrating against the OTel API crate rather than requiring their end-users set up this bridge. I have been seeing this happen in the Go ecosystem, oddly enough w/ Google's own client libraries, which for a long time still outputted data via OpenCensus, but have now finally moved to OTel, removing the need for end-users to (a) know about and (b) configure the opencensus bridge if they are using a Google client in their Go project.

At least, that's what I'm reading in the tea leaves :)

I am not sure I follow this part...

From a library author's perspective, I would guess they would prefer to do that w/ one single API, hence an OTel API crate will likely always exist in practice

I fully agree that users would prefer to see a single API. But the "OTel API crate will always" exist part is not clear. If this proposal moves forward, there won't be OTel Tracing API. (the crate itself exists, as it need to do other things like out-of-proc-propagation/baggage/metrics/logs etc., but it won't contain span APIs.)

Tokio can continue to maintain its own trace interface and developers may opt to use that SDK

To be clear: tokio-tracing is a facade only. (similar to OTel APIs). A subscriber needs to be enabled for things to light up..

a single SDK for everything, and that will be the OTel SDK

The current state is that there is a single SDK (but 2 APIs), and this proposal does not change the SDK part, but reduce APIs from 2 to just 1. I'd be happy if there was a single instrumentation API, but multiple SDK implementations (Alternate SDK implementation is something OpenTelemetry explicitly allows)

removing the need for end-users to (a) know about and (b) configure the opencensus bridge if they are using a Google client in their Go project.

The goal of this proposal is same. Users need not have to pick between 2 competing APIs, as there is only one. There is no bridge either!.

(Sorry I may not have fully understood your comments.)

We'll be discussing this proposal (and alternates) more in our weekly SIG calls. I would be happy to continue the discussion there if you can join us. The meetings are scheduled for Tuesdays at 9 AM Pacific Time, but we're always open to different times to accommodate more folks.

@TommyCpp
Copy link
Contributor

TommyCpp commented May 2, 2024

There are ~144k creates in Cargo(Rust official package management tool), ~7k creates dependent on tracing directly. Given this wide adapotion Otel must have a bridge to tracing. It also means on the long term the community will probably keep investing in tracing.

This leaves us two choice:

  1. Maintain two set of APIs(bridge with tracing and Otel tracing API)
  2. Adapt the tracing API and deprecate the Otel tracing API.

There are required changes needed on the tracing part and he long term commitment to support new Otel features from tracing. Assuming we can address these challenges, I think option 2 is easier to maintain and will get us to GA sooner.

@lalitb
Copy link
Member

lalitb commented May 2, 2024

While tokio tracing is the widely adopted in the Rust community, there is another (relatively small) tracing library minitrace which also provides the OpenTelemetry integration. The decision here is going to affect it, so tagging @zhongzc and @andylokandy - the active developers for visibility, and in case they have any comments to add.

@andylokandy
Copy link

andylokandy commented May 2, 2024

Thank you @lalitb for pinning minitrace. As minitrace has fewer API coupling to otel comparing to tokio-tracing (no builtin logging, or realtime span reporting, etc), the migration proposing in this issue has little impact to minitrace so long as there is still an API to upload spans to the otel collector.

Anyway, I'm glad that it'll be a good chance to refactor the crate tracing-openteletry. As mentioned in the benchmark from the author of tokio-tracing, tracing-openteletry is becoming the performance bottleneck of the system.

@lalitb
Copy link
Member

lalitb commented May 2, 2024

Few thoughts considering the current Rust ecosystem -

  • OpenTelemetry Rust may not need its own tracing instrumentation API, considering the existing tracing APIs in the Rust community, such as Tokio tracing, which is widely used.
  • The SDK layer in OpenTelemetry Rust should align with the OpenTelemetry specs, rather than simply acting as a subscriber or layer for Tokio Tracing.
  • Seamless integration with other tracing libraries should be possible in the future. While the best approach is unclear, a minimal bridge API, similar to the approach used for logging, could potentially facilitate this integration.
  • We also need to have some approach for the features not supported by existing tracing APIs - e.g, Baggage, and distributed context propagation. Probably providing an extension API for that. At least distributed context propagation should be supported.

@McSick
Copy link

McSick commented May 2, 2024

Is there a world where we can get Tokio tracing to be donated to OTel, they stay on as the main maintainers, and its the API we adopt for tracing? Then alongside we can fit in the OTel features like Baggage and cross process propagation etc.

@cijothomas
Copy link
Member Author

Is there a world where we can get Tokio tracing to be donated to OTel, they stay on as the main maintainers, and its the API we adopt for tracing? Then alongside we can fit in the OTel features like Baggage and cross process propagation etc.

Yes! Infact, I was writing details on how that'd look like. Will open as an issue later today with the details.

@tedsuo
Copy link

tedsuo commented May 2, 2024

So, I would like to talk to the Rust maintainers, as I do not speak for them. But as a GC member and co-founder of OpenTelemetry, I have opinions. :)

First and foremost, OpenTelemetry as a project is deeply committed to long term API support and backwards compatibility. We never break compatibility, and are especially sensitive to and changes that would cause the API to create dependency conflicts. Tracing in particular is a cross-cutting concern, dependency conflicts and broken instrumentation would create a ripple effect across many applications and libraries managed by different teams and OSS communities.

Regardless of how we proceed, I don't see any reason why we need to drop support for the existing API. Even in the case where we wanted to retire the SDK in favor of just using the Tokyo implementation, I would want to see support for the OpenTelemetry API continue. btw, I believe that Otel API support is already in there? Sorry, I'm only just getting up to speed on the Rust ecosystem.

I do have questions about metrics and logs, as Otel isn't just tracing. Are there suggestions for how these APIs and implementations would be handled? I see some of this discussion in #1571, so I might continue this conversation there.

@cijothomas
Copy link
Member Author

Even in the case where we wanted to retire the SDK in favor of just using the Tokyo implementation

That is not the plan! The OTel SDK would continue adhering to OTel SDK specs, but it'll now officially recognize tokio-tracing as it's API. This is partially true today also, as the OTel SDK has done lot of special casing just to support tokio-tracing! One might view this proposal as us saying, "Enough dating—let's get married!"

There are larger number of comments on this thread. I think it'd be better to continue this in the SIG/Community calls.
@tedsuo OTel Rust Maintainers will reach out to you and help you get upto speed before the next community call (May 7th 9AM Pacific)

@cijothomas
Copy link
Member Author

I do have questions about metrics and logs, as Otel isn't just tracing. Are there suggestions for how these APIs and implementations would be handled?

No change to metrics and logs. They are already following OTel specs.

@davidbarsky
Copy link

Hi folks: sorry the delay in responding. I'll try to go comment-by-comment.

Required Changes in tracing

Cijo:

  1. Introduction of Span Kind, an enum with 5 variants Server,Client,Producer,Consumer,Internal. The tracing-opentelemetry bridge currently handles this through specially named attributes, serving as a short-term workaround.

I don't know how I feel about this change, honestly: this would require a breaking change to tracing-core (which I'm very keen to avoid until we do a proper 0.2...) and introduces concepts to domains that tracing supports (e.g., embedded) where this doesn't make very much sense. I'd like to explore alternative approaches (whether that be through some dynamism in tracing or in the OpenTelemetry integration libraries) before committing to changes in tracing-core, as they are wildly disruptive: there's a highlander rule to tracing-core showing up in a dependency graph, so we'd do a semver-trick when we do a 0.2 to prevent such a thing.

  1. Span links - the follow_from in tracing might be the equivalent, but this requires more prototyping.

We're considering the future of follows_from—it's not very well supported now and doesn't translate well to two-dimensional representations—but it might be a good analogue. If it's something that fits the OpenTelemetry data model would make good use of, I think we'd be very open to improving it.

  1. The notion of "target" in tracing is equivalent to instrumentation scope, and could serve as a reasonable alternative, though it lacks support for version, schema URL, and additional descriptive key-value pairs.

I've wanted a notion of "typed targets" in tracing for a while, but I think it still requires some better const evaluation. No opposition to this in principle, the only concern I'd have is "can this be made static/const?)" (heck, most of my concerns tend to boil down to that...)

  1. tracing macros require attribute keys to be declared at compile time. This is probably manageable as shown by implementations such as reqwest-tracing and axum-tracing-opentelemetry.

Yeah, this is unfortunately a pretty hard requirement. I can elaborate more, but its key to tracing's current performance profile.

  1. tracing is not a 1.0 crate yet. This is not necessarily an issue, but OTel Rust and Tokio Tracing should require some co-ordination when planning future releases.

I think we've done a good job about not breaking users, but I think any work along that risks breakage will be publicized and taken with extreme care for the aforementioned highlander rule if nothing else.


Governance/Approaches

@tedsuo, @McSick:

How is this expected to work regarding feature development? Is the Tokio community willing to join the OTel spec going forwards?

I don't believe so/unsure. To be frank, I don't see a future in which tracing would move under the CNCF: if it's ever to move from the tokio organization, it would be the rust-lang org, and that's a very big if.

@MikeGoldsmith

Tokio can continue to maintain its own trace interface and developers may opt to use that SDK in conjunction w/ the bridge. I can imagine that over time it may appear that it's simpler to not use Tokio+bridge and developers will prefer to use a single SDK for everything, and that will be the OTel SDK. And, library maintainers may prefer to switch to integrating against the OTel API crate rather than requiring their end-users set up this bridge. I have been seeing this happen in the Go ecosystem, oddly enough w/ Google's own client libraries, which for a long time still outputted data via OpenCensus, but have now finally moved to OTel, removing the need for end-users to (a) know about and (b) configure the opencensus bridge if they are using a Google client in their Go project.

Cijo, Lalit, and Zhongyang explained it better than I could in their respective comments (and with greater tact and care—sorry, I haven't had the time to edit this better...), but the dynamic that exists in Rust today is what you've roughly described, except for "a single SDK for everything, and that will be the OTel SDK": the SDK that people end up using is tracing-rs/tracing.

@austinlparker
Copy link
Member

Cijo, Lalit, and Zhongyang explained it better than I could in their respective comments (and with greater tact and care—sorry, I haven't had the time to edit this better...), but the dynamic that exists in Rust today is what you've roughly described, except for "a single SDK for everything, and that will be the OTel SDK": the SDK that people end up using is tracing-rs/tracing

OK... but unless that SDK aligns with the OTel SDK, I don't see how this would work? OpenTelemetry is more than just tracing - it's logs, metrics, events, profiles, and whatever else comes up in the future.

@davidbarsky
Copy link

Cijo, Lalit, and Zhongyang explained it better than I could in their respective comments (and with greater tact and care—sorry, I haven't had the time to edit this better...), but the dynamic that exists in Rust today is what you've roughly described, except for "a single SDK for everything, and that will be the OTel SDK": the SDK that people end up using is tracing-rs/tracing

OK... but unless that SDK aligns with the OTel SDK, I don't see how this would work? OpenTelemetry is more than just tracing - it's logs, metrics, events, profiles, and whatever else comes up in the future.

There might be a disconnect between us: the tokio-rs/tracing library handles the data kinds you mentioned except for metrics, which is better handled through a dedicated library, IMO. At the same time, I worry that this might be expanding the scope of the discussion beyond this issue, which is just focused on the OTel tracing API (unless you're thinking about precedent, which, fair enough).

Is the goal of the OpenTelemetry governing committee to have an OpenTelemetry-complaint SDK be the dominant library providing logging/tracing/metrics functionality in each language? If so, I think that's an ambitious and commendable goal, but I worry that—at least for Rust—the migration to an OpenTelemetry-complaint SDK would cause non-trivial disruption in the larger Rust ecosystem and close the door on language-idiomatic optimizations, both in performance and aesthetics.


What was the resolution of the GC meeting on May 2, by the way? The meeting minutes didn't capture that.

@mladedav
Copy link

mladedav commented May 4, 2024

In otel terminology, SDK is the code that is responsible for sending data through otlp and other protocols, not for collecting observability data, right? Then yes, as was said earlier, the SDK would be compliant to the spec and some kind of bridge like tracing-opentelemetey would call its methods (or it would be completely replaced by a new library developed the OpenTelemetry team).

tracing on the other hand does not have aspirations to become the OpenTelemetry SDK, it would be just the API that end-users would use to instrument their code.

Or am I understanding this whole thing wrong?

@cijothomas
Copy link
Member Author

@mladedav You are right.

There is some confusion about the proposal, probably due to the word "SDK" having the special meaning in OTel world. I'll try to clarify.

  1. Both OpenTelemetry Tracing API and Tokio-Tracing's tracing serve as facades. This means they are essentially no-ops (non-operative) and do nothing on their own unless activated by another component. In OpenTelemetry, APIs like tracer.start_span("name") are part of this facade. The Tokio-Tracing equivalent would be constructs like span!("name", key=value1)
  2. In the OpenTelemetry ecosystem, it is the OpenTelemetry Tracing SDK (more precisely, setting up a valid TracerProvider) that activates the API. TracerProvider could be configured with a stdout exporter or otlp exporter and so on. In the Tokio-Tracing environment, a tracing-subscriber fulfills this role by activating the facade. There could be a subscriber which writes telemetry to stdout/file/even otlp format!. It is entirely up-to the subscriber!
  3. This proposal is only about adopting tokio-tracing's tracing facade as the OTel Tracing API. Following snippet shows an example API this proposal is referring to:
    OTel Tracing:
{
    let tracer = global::tracer_provider().get_tracer("module-name");
    let span = tracer
                .span_builder("span-name")
                .with_attributes([
                    KeyValue::new("attribute1", "value1"),
                ])
                .start(&tracer);
   do_something();
}  // span ends automatically here.

Tokio Tracing

{
    let span= span!(target:"module-name", tracing::Level::INFO, "span-name", attribute1 = "value1");
    let _guard = span.enter();
    do_something();
} // span ends automatically here.

  1. It's important to note that this proposal does not suggest any changes to the OpenTelemetry SDK. The SDK will continue to follow the OpenTelemetry Tracing SDK specification. However, some internal adaptations will be done to better recognize Tokio-Tracing.

OK... but unless that SDK aligns with the OTel SDK, I don't see how this would work? OpenTelemetry is more than just tracing - it's logs, metrics, events, profiles, and whatever else comes up in the future.

@austinlparker The OTel Rust SDK (named opentelemetry_sdk, shipped from this repo), is fully compliant with the OTel specifications. This proposal does not suggest any changes to that compliance.

@austinlparker
Copy link
Member

@cijothomas thanks for the clarification.

@hdost
Copy link
Contributor

hdost commented May 6, 2024

Is the goal of the OpenTelemetry governing committee to have an OpenTelemetry-complaint SDK be the dominant library providing logging/tracing/metrics functionality in each language? If so, I think that's an ambitious and commendable goal, but I worry that—at least for Rust—the migration to an OpenTelemetry-complaint SDK would cause non-trivial disruption in the larger Rust ecosystem and close the door on language-idiomatic optimizations, both in performance and aesthetics.

I think brings up something which I have kind been wondering about for a while now. Along with the pressure to stabilize it feels like we're trying to shove a square peg in a round hole sometimes. And to some extent allowing compatibility provides people the ability to save processing where they need to and have the richer features set where they don't. We are striving for best performance for the library, but there are some inherent things in the standard which won't always lead to the best possible performance. The compile time tags are one example of this in tracing vs runtime in the OTel API. Though to be honest even in languages like Java I hardly use dynamic attributes names specifically for performance reasons.

Going back to this timeline I feel that this artificial pressure to have all the languages stable while it's great to have stability it's a bit of a farce to push a specific timeline. This is an open source project which not everyone gets paid to work on. So if we want to have deadlines where's the support? And I don't think the answer is to push more people who are unfamiliar with Rust to ramp up on it just to push this goal.

P.s. Sadly since we've started this discussion I have not had as much time to work on the other option which is supporting compatibility between the facades.

@austinlparker
Copy link
Member

Going back to this timeline I feel that this artificial pressure to have all the languages stable while it's great to have stability it's a bit of a farce to push a specific timeline. This is an open source project which not everyone gets paid to work on. So if we want to have deadlines where's the support? And I don't think the answer is to push more people who are unfamiliar with Rust to ramp up on it just to push this goal.

Speaking both personally and as a GC member, I'd be curious where you see pressure coming to stabilize from? We're pushing to stabilize the specification and semantic conventions (for the benefit of both end-users and implementors), but I don't feel like we're trying to get language SIGs to rush towards stability. Personally, I'd be more invested in languages adopting a breadth of spec features, and trying to make idiomatic implementations of them, vs. rushing to hit completeness.

More generally, I think there's somewhat of an open question about the goal of the SDK in each language. Certainly, the goal is for OpenTelemetry to be 'built in' to languages, runtimes, libraries, frameworks, etc. What does 'built in' mean? Does it mean that we expect OTLP data to be available? Does it mean that if I, as a developer, write against an OpenTelemetry API that my telemetry 'just works' with other telemetry (e.g., if I'm writing business logic as part of an HTTP route handler and I add an attribute, that attribute gets added to a span that's been created by semconv-compliant telemetry generators)?

Part of the rationale behind how OpenTelemetry is run, as a project, is that we expect languages to create idiomatic ways to accomplish these goals -- and those language SIGs should feel like they have the freedom to do so. The practical reality, though, is that we specify the API and SDK because to accomplish some of the aforementioned goals we do need everyone to be on the same page in terms of 'how do I get the current span in context', or 'how do i pass context headers', etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests