-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and generalize Span.log API #96
Comments
Meant to reply earlier.. here it goes! Main thing is that this moves more explicitly into runtime checks (as Pardon the crap stuff below, but it would look something like below to Ex. log match { (let {:keys [msg level] :or {level "info"}} log) From an interop with java language perspective, there's a few follow-ups
|
My biggest conceptual concern is that this "kicks the can down the road" (or "kicks the can up the stack"??) as far as the semantic distinctions between these various logging use cases. I.e., it weakens the "instrument once" Tracer interchangeability story. FWIW / side-note: what you are suggesting here is essentially equivalent to what go-kit does: https://github.com/go-kit/kit/blob/master/log/log.go#L18 |
Yes, but Map is an alternative signature, the primary one could be a variadic method, or a set of overloaded methods like
First, the dispatch does not need to happen in Java. If we are talking about tracers, then all they need to do is pack the set of key/values from the log method into a span and let the backend deal with interpretation. The example of dispatch (with some pseudo-code): interface LogTagHandler {
handle(Object value, LogData log);
}
Map,String, LogTagHandler> handlers = ImmutableMap.of(
"level", new LevelLogTagHandler(),
"payload", new PayloadLogTagHandler(), // etc
);
Span log(Object... tagsAndValues) {
LogData logData = new LogData();
for( tag, value in tagsAndValue) {
handler = handlers.get(tag.toString());
if (handler != null) {
handler.handle(value, logData);
} else {
logData.appendCustom(tag, value);
}
}
} But again, this only needs to happen way down the line in the collection pipeline. The actual tracer implementation, where performance matters the most, does not need any dispatching.
Right now we have a pretty odd looking set of logging methods, covering only an odd subset of use cases. And we've been arguing for months before we even got those in. My concern is that we'll be stuck for a long time if we try to come up with clean semantic distinction, via method signatures, to the use cases outlined above. I think we'll have a much easier time converging on a common set of logging tags, just like we did for span tags, while having a simple and generic logging API. What I also like about it is the symmetry with the span tags. The API is straightforward and can be fixed for a long time, without worrying that v 1.0 tracer is incompatible with v 1.1 user code because we happen to change some argument in the log method. Lastly, strictly speaking, a structured logging API does not require any additional semantic distinctions, e.g. if you have a backend (like ELK) that properly indexes structured logs, then the end users can always find what they are looking for without defining semantic meaning of the tags. That's not to say that we shouldn't define some standard tags like |
Ah ok.. so this could well end up similar to how finagle works (depending
|
Actually, what I meant is that even reporting thread does not need to do any processing of the logging tags. Let's say we use Thrift for out of process span reporting, so I would define something like this: struct Span {
...
9: list<Log> logs
}
struct Log {
1: int64 timestamp
2: list<LogTag> tags
}
struct LogTag {
1: string key
2: string value
} So the only processing that happens in-process and affects the end user performance is:
Once the span data is shipped out of process to the Collectors, only there do they need to start interpreting the tag keys.
I would assume that what the Handlers do is simply populate a wider LogData struct where special tags are pulled out as top level fields, i.e. LevelTagHandler would set a |
I'm coming around to this proposal. I do think it "kicks the can up the API" per my previous note, but – even if that's true – there is value in making the lowest level of the logging API completely general. I want to think through the implications for OpenTracing users who just want to do the simple thing (e.g., logging informational messages). Using Go as an example, I can imagine something like
I can get enthusiastic about "option 1" above and the idea of Anyway, if we can agree that OpenTracing should try to offer things like option 1 above, I see the merits of making the |
Side note... @yurishkuro said
I like the idea of log payloads that can be arbitrarily large with the caveat that the tracing impl will not necessarily serialize all of them... i.e., the amount of payload that makes it downstream to the tracing system is best effort, perhaps beyond some sort of guaranteed minimum. In that case there is still key:value computation that needs to happen before the data gets out of the process, but I readily admit this is not something that is necessary, it's just something that can be done in order to make some attractive feature (like large structured logging userdata payloads) feasible. |
Thanks for clearing this up, basically this is moving most logic
out-of-process ie to the collectors. In fact anything structural is only
enforced after it leaves the process.
This is a variant of @mattheath's phosphor, except taking less
responsibility in the api or transport form. In phosphor's case
(Annotation), you can timestamp key/values in an annotation. However,
there's certainly fields in the struct. This helps avoid the cost of
serializing well-known keys and making everything subject to interpretation.
https://github.com/mondough/phosphor/blob/master/proto/trace.proto#L5
There are likely timestamped map shippers out there. Ex some general obs
tools do things like this. However, I wonder if it isn't too generic, as
you could indeed write a spec like this in any project. It is not more or
less helpful here since it has basically no opinion except time, which is
not that opinionated.
|
@bensigelman I am not that much in favor of
|
@yurishkuro re On that note, for this particular (github) issue, I don't need to know whether we're going to support infof logging specifically... but I still want to advocate for convenience functions that help with the (standardized) structured logging key:values. E.g.,
The "Option B" story gets even more unmanageable if we try to provide structure within each stack frame (e.g., separating the symbol name and func:line info). Anyway, my point is not that we should necessarily support |
@bensigelman I have no objections to util functions that eventually delegate to a single |
It's time to move this forward... the basic idea is straightforward, and really most of the interesting part is just deciding on the right idiomatic interface for each language. I have three additional requirements to propose, though... discussion/dissent about these is welcome:
Part of me wants to wait on the functions/closures, but at least thinking about them might help us pick better typedefs and so forth. I should also point out that 1. and 2. above are already handled by OT. |
There's also one migration concern. We probably want the most generic k/v log method to be called |
My argument: the timestamp is a required argument and thus can be split out... it's just a required argument with a default of
Yeah, just a fast-path for translations. I guess "requirement" is the wrong word... we could always add it later since it's an optimization. But something to think about as we decide on types and so forth.
I could see it either way. If we support arrays, are they allowed at the "top level"? If so, how to distinguish between interleaved KV and an array?
I agree that there's a question here, but I think we can/should address Go stuff in the Go PR... for this issue I was hoping to focus on pan-language semantics. |
Some notes from an in-real-life conversation with @yurishkuro:
|
People who care about this issue should check out opentracing/opentracing-go#108 |
I think that having only key:value pairs over-generalizes this. By only kind-of-formalizing common key names, trace implementations will not easily be able to build upon a minimum set of required information. If OpenTracing is used in multiple 3rd party libraries (e.g. web frameworks, ORMs, ...) they might all end up using different values - or even different data types - especially for My main question is how you intend this feature to be used:
If so, filtering will be a very important requirement. As soon as you have many logs, you want to be able to filter them before they are sent away (to reduce traffic). This is very similar to the general "sampling" but on a per log basis. Imagine having multiple libraries using OpenTracing to log messages. Some of them might be very verbose, some not. You might not always want to get all messages from all libraries.
Formalizing on these common fields would make such global filtering easier:
I also think the following should be separate fields/parameters in the API:
Also, from an API usability point of view, having only key:value pairs brings a weird difference between creating regular spans and logging - e.g:
|
fyi: this is how the new logging standard from ASP.NET Core (and many other .NET Core based libraries) defined its generic log method: void Log<TState>(
LogLevel logLevel, // a fixed enumeration
EventId eventId, // a numeric identifier for the event
TState state, // a payload object for structured logging - can have many fields/values!
Exception exception, // loggers decides which fields to log etc.
Func<TState, Exception, string> formatter // a method which creates a "readable" version of state - eg for console output
); They then offer convenience extension methods, which are added to the existing ILogger type without having loggers to implement them - e.g.: void LogInformation(EventId eventId, Exception exception, string message, params object[] args);
void LogInformation(EventId eventId, string message, params object[] args);
void LogInformation(string message, params object[] args); |
@cwe1ss sorry to be brief, happy to elaborate... The idea here is to settle on the lowest level of the API. It may be that we eventually support higher-level logging concepts built on top of this foundation, but I would like to keep the Span API out of the business of coming up with a "CISC"-style (sorry for the 90s processor analogy) logging portfolio that will please all parties... it would quickly eclipse the rest of the Span API and would be – IMO – a distraction. We would definitely want to standardize on certain keys, much like we have with certain tags a la http://opentracing.io/data/ Does that set your mind at ease at all? |
yes, I definitely agree that logging is a completely different beast. That's why I think you have to be very careful with how you advertise this. It should be clear whether it should be used for "Order ABC processed for customer 123"-like log messages that would otherwise end up in ElasticSearch. That's why I personally liked the "LogEvent"-naming. I understand this as a method that allows me to add tracing-relevant checkpoints to my span and nothing else. Regarding the API, I think ease-of-use for users should be the first priority. Required fields (if there are such - like operationName for StartSpan) should be clearly visible as such. |
fyi - regarding key:value pairs: C# has a nice feature called "anonymous types" which would allow users to do something like this (which looks a bit nicer than creating a dictionary): var tags = new { Level = "info", KeyXY = "SomeValue", Message = "SELECT * FROM Customers" };
span.Log(tags); However, this has quite a few drawbacks:
So I don't think that it would be a very good fit for this scenario. |
If there are no complaints by mid-day on Monday I will start dealing with the repercussions of these various API sketches/RFCs... Again, just to have everything in one place: Go: opentracing/opentracing-go#108 (I don't think anyone but me cares about Objective-C :) |
My largest concern with this change is that it is to generic and therefor, as @bensigelman said earlier
To me the simplest form of logging contains a message string and a timestamp. There is a discussion why the timestamp should be a separate parameter, and I would prefer to have the message treated the same way. To this we could then add metadata for semantic logging, loglevel etc., in one generic form or another. The purpose of this would be that even if there are different ways of adding extra payload, at least a message string and a timestamp could be interpret by all tracer implementation. With that said, I like the way the previous payload parameter now is formalized into a key-value array. |
I agree that it's the simplest, but I disagree that it's the most useful. In large scale distributed systems, logs are not written for humans to read, they are primary meant for machine processing. If the only thing that you have in the log is a scalar string message, the KV API allows you to log it just as easily, as The KV API we're proposing does not preclude a "simple message with a timestamp", and does not even make it significantly harder to log, but by being more general it allows a lot more expressibility for state logging. |
This is done for Java, Go, Python, Javascript, and ObjC... C# and C++ are not stable yet and thus shouldn't block this issue. Thanks all, and esp @yurishkuro for making the suggestion in the first place. |
Related to earlier discussions, including #30, #90.
I would like to propose to replace the current
log_event
andlog
methods with a single structured logging method:Why
This API addresses several different requirements that were previously raised:
log_event
can be achieved bylog(event="..", ...)
log(msg="...", ...)
timestamp
tagpayload
taglevel=info | error
exception=ex
tagImplementations
Specific API will depend on the concrete language features.
Python
Go
Can use either a variadic function with alternating keys and values, similar to go-kit's API:
or a zero-allocation approach used in https://github.com/uber-go/zap:
Java
Best format I can think of is a variadic method with alternating keys/values, similar to Guava's
ImmutableMap.of(k1, v1, k2, v2, ...)
. It can also be overloaded to accept aMap<String, Object>
.Javascript
Pass a dictionary
cc @bensigelman @dkuebric @michaelsembwever @adriancole
The text was updated successfully, but these errors were encountered: