-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokio Trace: A scoped, structured, logging system #561
Comments
I really like this proposal! Thanks for sharing it. I have two questions I'd like to clarify:
|
Re 1. This proposal is primarily for process-local tracing, not distributed. However, I expect distributed tracing to be able to be implemented on top of it. You would need a combination of a subscriber that is distributed tracing aware AND a tower middleware that extracts the request ID from the request and informs the subscriber. At that point, the subscriber would have linked the process-local trace w/ a distributed trace request ID. If no request ID is provided, then the process-local trace cannot be associated with a distributed trace. However, there is still utility for debugging, logging, metrics, etc... I also expect there to be multiple subscribers in a given process. For example, you would have your distributed tracing subscriber that associates the process-local trace w/ a request ID and ships it out, and you would also have your metrics aggregator, that just aggregates metrics like load, request rate, error rate, etc... and sends them to prometheus, and finally maybe a logging subscriber that generates logs to a file (or whatever logging aggregator). Re 2.
The main issue is subscribers are generic and there will be a level of dynamism between event emitting and subscribers. In other words, there will most likely be a
Definitely interesting, but tricky due to generics. I would ask what the goal is and we can see what is possible. The difficulty is with generics. Lets assume that the trait Value { ... }
trait Subscriber {
fn tag_span<T: Value>(&self, id: usize, key: &'static str, value: T);
} The problem is that trait Value { ... }
trait Subscriber {
fn tag_span(&self, id: usize, key: &'static str, value: &Value);
} But then However, serde's There are ways to work around this, but are pretty heavy (lots of allocations). |
The structured logging part sounds very similar to the proposed addition to the They're exploring how to have generic key-values, while still being object safe and reducing allocations. |
Gotcha! I've heard the phrase “event“ used in this context—it threw me off for a bit. It might help to settle upon a shared vocabulary.
Makes sense! You're right that it's a high-level concern, depending on middleware to function correctly.
Strongly agree—I've found that it's super useful to emit a single event/span/whatever per unit of work.
👍🏻
That's a good point—I didn't consider object safety here.
My goal would be to provide some sort of trait that indicates a type is instrumental. For instance, being able to trivially insert headers/URI from http into a span would be really useful in that it provides a lot of debugging context and information. Serde seems popular and prolific enough that it might meet that purpose. The structured logging part sounds very similar to the proposed addition to the log crate: rust-lang/log#149 Yeah! I think the biggest difference is that a span would correspond to a single unit of work, while a log doesn't. My mental model—and my employer's—align pretty closely with Honeycomb's.
Naturally—it's worth paying close attention to. |
I think it would be great if this wouldn't be Tokio specific, but the default for the entire Rust ecosystem. Like @seanmonstar suggested a crate like |
The implementation won’t be Tokio specific, but it will live (at least
initially) under the Tokio umbrella.
We can worry about extracting / crate name logistics once the crate exists.
…On Sat, Sep 1, 2018 at 9:21 AM Thomas de Zeeuw ***@***.***> wrote:
I think it would be great if this wouldn't be Tokio specific, but the
default for the entire Rust ecosystem. Like @seanmonstar
<https://github.com/seanmonstar> suggested a crate like log (or maybe a
similar generic crate) would be good place for this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#561 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAYJPzRtgahZ1kT6HUoto2S3ks1atEXks5uWrQRgaJpZM4WIhrH>
.
|
@carllerche how can the community help? |
@austbot Great question. This issue represents a brain dump. I also had an initial sketch one (linked in the original post). What I would suggest:
@davidbarsky has also expressed interest. I'm not sure what he has done so far. Maybe you two can talk in Gitter some. I think that at this point it would be worth having multiple people experiment with different ideas. We can then look at the various strategies and merge the good ideas together. |
Sorry for the delay in responding—interns started last week, and supporting them is a lot of effort. Main thing I looked at is looking at prior art in tracing systems like OpenCensus and [Jaeger] with regard to sampling in trace collection. It appears they support application-level sampling, which might change some parts of this design. |
Some metrics to expose: When building applications with tokio, it can be hard to understand reactor/executor behavior at runtime. I want to be able to ask a few questions of the runtime like the following:
It's important that these values are not time-bounded or otherwise pre-rated so that they fit well into timeseries -- by tracking totals we can determine the rate at read-time without having to configure the runtime with a window... Copied from: #147 |
This is very interesting. Currently it's not straight forward to log async/multithreaded code. How do you imagine this? For a stdout subscriber for example would there be a tradeoff delay vs readability? If you log only once every second, you can de-interleave log statements from different tasks to make it readable? Note that terminals as well as files could also use a different strategy, in that they can be rewritten, eg. log instantly, but as new stuff comes in, rewrite the events to change the order on screen/in file? I'm really curious. |
For stdout, I would log immediately w/ a trace identifier. This would allow filtering by trace via grep. There is also a GSOC project to build an interactive terminal app that allows richer filtering / rendering: tokio-rs/gsoc#1 |
<!-- Thank you for your Pull Request. Please provide a description above and review the requirements below. Bug fixes and new features should include tests. Contributors guide: https://github.com/tokio-rs/tokio/blob/master/CONTRIBUTING.md --> ## Motivation In asynchronous systems like Tokio, interpreting traditional log messages can often be quite challenging. Since individual tasks are multiplexed on the same thread, associated events and log lines are intermixed making it difficult to trace the logic flow. Currently, none of the available logging frameworks or libraries in Rust offer the ability to trace logical paths through a futures-based program. There also are complementary goals that can be accomplished with such a system. For example, metrics / instrumentation can be tracked by observing emitted events, or trace data can be exported to a distributed tracing or event processing system. In addition, it can often be useful to generate this diagnostic data in a structured manner that can be consumed programmatically. While prior art for structured logging in Rust exists, it is not currently standardized, and is not "Tokio-friendly". ## Solution This branch adds a new library to the tokio project, `tokio-trace`. `tokio-trace` expands upon logging-style diagnostics by allowing libraries and applications to record structured events with additional information about *temporality* and *causality* --- unlike a log message, a span in `tokio-trace` has a beginning and end time, may be entered and exited by the flow of execution, and may exist within a nested tree of similar spans. In addition, `tokio-trace` spans are *structured*, with the ability to record typed data as well as textual messages. The `tokio-trace-core` crate contains the core primitives for this system, which are expected to remain stable, while `tokio-trace` crate provides a more "batteries-included" API. In particular, it provides macros which are a superset of the `log` crate's `error!`, `warn!`, `info!`, `debug!`, and `trace!` macros, allowing users to begin the process of adopting `tokio-trace` by performing a drop-in replacement. ## Notes Work on this project had previously been carried out in the [tokio-trace-prototype] repository. In addition to the `tokio-trace` and `tokio-trace-core` crates, the `tokio-trace-prototype` repo also contains prototypes or sketches of adapter, compatibility, and utility crates which provide useful functionality for `tokio-trace`, but these crates are not yet ready for a release. When this branch is merged, that repository will be archived, and the remaining unstable crates will be moved to a new `tokio-trace-nursery` repository. Remaining issues on the `tokio-trace-prototype` repo will be moved to the appropriate new repo. The crates added in this branch are not _identical_ to the current head of the `tokio-trace-prototype` repo, as I did some final clean-up and docs polish in this branch prior to merging this PR. [tokio-trace-prototype]: https://github.com/hawkw/tokio-trace-prototype Closes: #561 Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Summary
Add a new component, Tokio Trace which provides a scoped, structured logging system. It is scoped in that each event tracks its parent. It is structured in that events are a set of key / value pairs with the associated event information.
Tokio Trace would provide the foundation upon which Tower can build request scoped tracing and integration with systems like open tracing or other instrumentation systems can be built.
Motivation
Tracking the logic flow through an asynchronous system like Tokio can be quite challenging. Since individual tasks are multiplexed on the same thread, associated events and log lines are intermixed making it difficult to trace the logic flow.
There also are complementary goals that can be accomplished with a well built event foundation. For example, metrics / instrumentation can be tracked by observing emitted events.
There is plenty of precedence upon which to draw. This proposal draws heavily from tracing systems like Dapper, Zipkin, and Open Tracing.
Goals
Tokio Trace would provide an API that allows libraries to emit information around events. Subscribers could then be attached and process these events. The API needs to be low enough overhead to allow include debug level events and the API needs to avoid adding any overhead when there are no subscribers that care about the information.
Event scoping
Tokio Trace will provide a tracing style API. Roughly, this means that events have a start and an end point. Also, events can be nested.
For example, imagine an HTTP request that requires a database query to build the response. An event would be processing the HTTP request. This would start when the request is received and complete when the response has been written to the socket. The database query is a child event and is fully contained by the HTTP request event. The nesting should be tracked so that subscribers can rebuild the structure.
Causality
Besides just parent / child relationships, there is also a need to track more complex causality. One example is the
h2
crate. The HTTP 2.0 protocol allows request / response exchanges to be multiplexed on a single TCP socket. To implement this, a dedicated task is spawned to manage the TCP socket. Each HTTP 2.0 request / response exchange interacts with the TCP socket task via message passing. Because of this structure, there is no clear parent / child relationship between the logic processing the request / response exchange and the logic that does the actual TCP work for that request / response exchange. However, we still want to track the causality to aid in debugging.To support this goal, events may have an arbitrary number of
follows_from
annotations. This creates a relationship between two events from separate traces.Structured logging
Traditionally, logging systems use text strings as log entries. This is simple, but loses a lot of of the event details. Structured logging represents events as a set of key / value pairs, maintaining the original type of the event data. Tokio Trace will use such a system. For example:
Values will most likely be limited to a set of rust types found in
std
orcore
.Design requirements
Tokio Trace is split up into multiple crates:
tokio-trace
is the crate that libraries and applications depend on to get instrumentation APIs.tokio-trace-subscriber
contains utilities for implementing event subscribers as well as some default implementations (such as a STDOUT logger).tokio-trace-core
is the bare minimum needed to provide a bridge betweentokio-trace
andtokio-trace-subscriber
.API stability
The
tokio-trace-core
crate must provide a stable API. When a library changes itstokio-trace-core
dependency version, it is considered a breaking change for that library.The other crates,
tokio-trace
andtokio-trace-subscriber
may release breaking changes at a faster pace.Because of this, the
tokio-trace-core
crate's initial release must be bare minimum needed to function. It must also be designed with forwards compatibility in mind.Performance
It is critical that
tokio-trace-core
adds minimal overhead to instrumented libraries. Subscribers must opt-in to instrumented events. If no subscriber opts in to an instrumented event, the cost of that point of instrumentation should be as close to zero as possible.Even when subscribers opt into events, instrumentation points should be as cheap as possible.
Regarding performance, it is assumed that hitting instrumentation points is the hot path and creating a subscriber is uncommon (most likely once at the beginning of the process).
API
The specifics of the API are still very much in the air. A sketch can be found here.
Given the goals stated above, defining an event must have a start and an end. Given the asynchronous nature of working with Tokio, this will require being able to have a value to pass around into futures. Also note that spans need to hook into a future's poll function:
A higher level combinator API can be provided:
Again, the details are TBD, but the general goal is to incur as little overhead as possible if there are no subscribers to the event or if there is a subscriber, but that subscriber does not care about certain aspects, such as building a trace.
Subscribers
One or more subscribers need to be registered to receive events. A subscriber can receive events and process them however is appropriate. The goal is for the tracing API used by libraries to not do any work, instead punting the work to subscribers.
A subscriber should be able to subscribe to a subset of events, for example the subscriber might decide to only receive events from the
h2
module.Things a subscriber might do:
Integrating higher up in the stack
The Tokio Trace crate would provide the necessary infrastructure for emitting and subscribing to events. It is then up to libraries to use the API to emit events.
Tokio itself would instrument its APIs as a starting point. Once the API and performance is deemed acceptable, there would be an effort to instrument higher levels in the stack.
Tower
Tower is a request / response abstraction and a middleware stack. This would allow instrumenting events at the request / response level. The request / response point is where most existing tracing systems start.
Hyper
Hyper is an HTTP client and server implementation and would be able to provide valuable event data. It would be able to instrument the exact start and end points of the HTTP request.
Conclusion
This is issue represents my current thinking on Tokio Trace and the direction that I hope it goes. It is mostly a brain dump, but is a good starting point to get the discussion going.
Instrumentation and visibility are pretty critical to production applications and I think getting some solid infrastructure is an important next step.
The text was updated successfully, but these errors were encountered: