-
Notifications
You must be signed in to change notification settings - Fork 345
Conversation
* @param keyValues Alternating key Strings and value Objects. Values may be numeric types, bools, Strings, or | ||
* arbitrary objects, though the treatment of arbitrary Objects varies across Tracer/Span | ||
* implementations. | ||
* @return the Span, for chaining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage example in the docs would be useful
@yurishkuro I added a trivial usage example for clarity. Curious to know what others thing (including C# people who have some analogous constraints)... @adriancole @michaelsembwever @dawallin @cwe1ss Just so we're all on the same page: the idea here is to choose a narrow base logging API rather than trying to please all parties who want something more elaborate. Those sorts of elaborate interfaces can be projected onto this underlying Span API (and, in many cases, I would hope that adapters could be written that translate slf4j-style logging into Span key:value logging (though we need a standard way to access the active span... something I'd like to take on later this year). |
Is this now just an object array? This would allow non-string keys and an uneven number of parameters. That seems too generic IMO. I'd prefer a map/dictionary or a list of tuple/custom type to at least limit the key-type to string. |
@cwe1ss those can always be added, but they are significantly less efficient |
@yurishkuro, I did a quick test in C# (I guess Java will be similar) and compared object array vs. custom type array vs KeyValuePair list vs. Dictionary:
As expected, Dictionary ("Map" in Java) is the slowest because it ensures unique keys. However, a custom KeyValue-type is just marginally slower than the object array but it at least ensures string keys and a correct number of arguments. It's a little bit more verbose though. Using the object array has a very short syntax, but at least in my experience, using this for a pair of values is very uncommon in C#. I think I've never seen this before. Is this common in Java? I'm afraid this would be very confusing for users of this library?! That's why I'd still prefer the custom type over the object-array. // That's the code from the test
public static void Main(string[] args)
{
int iterations = 10000000;
Console.WriteLine("Hello World! Iterations: " + iterations);
// object array
var watch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
LogObjectArray("key1", "value1", "key2", 10, "key3", 4.3, "key4", true);
}
Console.WriteLine("Log(object[]): " + watch.ElapsedMilliseconds + "ms");
// custom type array
watch.Restart();
for (int i = 0; i < iterations; i++)
{
LogCustomTypeArray(
new LogValue("key1", "value1"),
new LogValue("key2", 10),
new LogValue("key3", 4.3),
new LogValue("key4", true)
);
}
Console.WriteLine("Log(LogValue[]): " + watch.ElapsedMilliseconds + "ms");
// KeyValuePair List
watch.Restart();
for (int i = 0; i < iterations; i++)
{
LogList(new List<KeyValuePair<string, object>>
{
new KeyValuePair<string, object>("key1", "value1"),
new KeyValuePair<string, object>("key2", 10),
new KeyValuePair<string, object>("key3", 4.3),
new KeyValuePair<string, object>("key4", true)
});
}
Console.WriteLine("Log(KVP List): " + watch.ElapsedMilliseconds + "ms");
// dictionary
watch.Restart();
for (int i = 0; i < iterations; i++)
{
LogDictionary(new Dictionary<string, object>
{
{ "key1", "value1" },
{ "key2", 10 },
{ "key3", 4.3 },
{ "key4", true }
});
}
Console.WriteLine("Log(Dictionary): " + watch.ElapsedMilliseconds + "ms");
}
public static void LogObjectArray(params object[] args) { }
public static void LogCustomTypeArray(params LogValue[] args) { }
public static void LogList(IList<KeyValuePair<string, object>> args) { }
public static void LogDictionary(IDictionary<string, object> args) { }
public struct LogValue
{
public string Key { get; }
public object Value { get; }
public LogValue(string key, object value)
{
Key = key;
Value = value;
}
} |
Having a custom type instead of a simple array would also allow us to add helper methods for known keys later. Example: // sorry for the continuous use of C# in this java repository :)
public struct LogValue
{
public string Key { get; }
public object Value { get; }
public LogValue(string key, object value)
{
Key = key;
Value = value;
}
// could be directly on the type or on a separate class
public static LogValue Level(string level) => new LogValue("level", level);
public static LogValue EventName(string eventName) => new LogValue("event", eventName);
}
// usage
span.Log(LogValue.Level("info"), new LogValue("key", "value")); Inheritance (e.g. EDIT: because it would have to be a regular class (reference type) instead of a value type. If I understand the go implementation correctly, it also does all kinds of strong typing and forces key/value usage. So I wonder why we wouldn't want to have this in Java/C# if there's only a rather insignificant perf hit. |
One thing your benchmark does not take into account is the increased GC pressure from allocating all those LogValue objects. In Go those objects are passed by value and have no heap allocations. So in Go you need to use more verbose syntax in return for higher efficiency, while in Java a more verbose syntax leads to lower efficiency. But I am not opposed to having both. |
in C#, a Does Java have something similar? |
I ran my test using the unoptimized DEBUG mode. In RELEASE mode, using the custom type is even faster than using the object array. 😄 (I don't know why - maybe because there's less boxing involved?!) I also ran a memory profiler (JetBrains dotMemory) and couldn't see any noticeable difference. So, unless you want otherwise, I'd like to use the custom value type in C#. Maybe the situation is similar in Java? Disclaimer: I only have very basic experience with detailed performance/memory tests, so I might have done/read it wrong.
|
@cwe1ss @yurishkuro of course meeting a minimum bar for performance is a requirement... that said, I am more concerned with making the API unobtrusive and low-boilerplate. In Java-land that's swimming against the current, though, and perhaps it would be better to embrace the local style and do something heavier-weight. I just pushed a different version of this PR/RFC that is a little more explicit... I think the benefit here is just what @cwe1ss was getting at with the future presence of helpers to do things like |
Span log(long timestampMicroseconds, String eventName, /* @Nullable */ Object payload); | ||
* Like log(LogField...), but with an explicit timestamp. | ||
*/ | ||
Span log(long timestampMicroseconds, LogField<?>... fields); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Go API the only way to log with explicit timestamp is to provide logs in bulk to the Finish method. What makes Java different that it requires an explicit timestamp? I do not recall ever seeing a Java logging API that explicitly accepted timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it because it's consistent with Finish etc. Also, the finish method doesn't accept logs right now.
I like this, using the custom key-value type array instead of object array makes it much clearer and more natural, at least in C#. |
Well, shockingly, after 15min of googling I could not find an existing Java API for structured logging. |
not to be contrary, but googling structured logging java api should return On Mon, Sep 5, 2016 at 8:04 AM, Yuri Shkuro notifications@github.com
|
@rgoers I know this is an old issue, but seems like you were behind structured data support in log4j back in the day. This issue is about adding similar functionality to a distributed tracing version of a logging api. Do you have any insights looking back from now until then about the task, and maybe some modeling gotchas we might want to be aware of? |
yes, but they explicitly use Maps, so I classified those into "not found" category. |
I think we need to enumerate the qualities we think are performance-based non-starters with benchmarks. Then we focus in two areas.. what's the api we want (if we want it), and can we afford that api! For example, there are several choices in this api design
These types of concerns are so provable with JMH, it would seem best to use benchmarking instead of conjecture about what should be tossed on account of performance. |
I guess there's one more design option, a Map-like interface (similar to text-map reader thing) interface/class Fields<K, V> extends Iterable<Map.Entry<K,V>> {
... internalize your favorite backing choice, likely an array or a delegate
static <K, V> Fields forMap(Map<K, V> map) {
.. convert
}
} |
Since you asked... although I feel like I am walking into the middle of a discussion where I don't have all the background. Yes, structured data (namely, support for RFC 5424) was one of the things that motivated me to start work on Log4j 2. If you look at the Log4j 2 API you will see that it really logs Messages. The RFC 5424 Message is just one of the supported message types. The advantage of this is that each Message type can be tuned for the data it is manipulating. The discussion around creating objects is interesting. We have run a lot of benchmarks that have shown that Java has significantly improved in dealing with garbage, especially when the objects are created and dereferenced quickly. That said, Log4j has users who require a garbage-free logging framework so efforts were made in our latest releases to make it possible to use Log4j in specific documented ways without creating any garbage. What is interesting is that in some cases the throughput is slightly lower when this is enabled. The Log4j web site has a few pages that document this. I am not at all familiar with opentracing but I would be looking at the data format(s) you want to use for transmitting the data. I would also be looking at how the data gets passed to you. Creating an efficient data structure is great but not if the API to manipulate it sucks. For example, if you use an array does the user somehow have to know how big the array needs to before they start adding to it? |
@rgoers thanks for chiming in. I'm not terrible surprised to hear that there's a tradeoff between logging throughput and GC churn (which is really just another flavor of latency)... @adriancole I like the idea of using benchmarks to eliminate certain options. You're not suggesting this, but I want to make it clear that I don't want to use benchmarks—esp microbenchmarks—to select the "best" option... It should also be noted that OpenTracing impls will often have some sort of sampling regimen in place; that would suggest a bias towards APIs that minimize the amount of work done before the function call actually takes place (e.g., it would be a bummer to construct a fully-formed Map in the caller just to have it dropped by the (unsampled) Span impl below). I'll let others offer any thoughts they have before trying to move forward with this (probably tomorrow sometime). |
Thx for the reply, Ralph! Lots to think about, just mentioning one thing I
|
|
/** | ||
* LogField represents a single key:value pair in a Span.log() record. | ||
* | ||
* The key must always be a String. All Tracer implementations must support values bool, numeric, and String values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"support values bool, ... values" or just "support bool, ... values"?
Just to make sure I understand, would your concern be addressed by a |
knowing that we want to keep trunk shippable, punting on this runs the risk
of releasing without defining a predictable way of doing what was done
before.
I understand that there's a bike shed to be had about which key is best,
but I'd argue this.
don't deprecate the old method or even suggest a replacement until there's
a valid predicatable replacement. This puts the burden on those championing
the Map api with responsibility of making it usable.
|
@adriancole your concerns are totally reasonable re standardizing on a key before deprecating the old way of doing things. Though it would be true to my heritage at google... what was the quote? "There's the way that's deprecated and the way that's not ready yet." Something like that. Still curious what in particular you had in mind (per #47 (comment))... I would prefer a |
What I was thinking is that is should be possible to make a wrapper that
keeps the span simple and not pollute systems like zipkin with unsearchable
double encoded json or having to present awkward nested things in the UI.
Lacking a way to make OT simple, I frankly wouldn't suggest people use it.
In order to make it simple (or possible to be simple), we'd need to define
some way of saying this is a normal annotation (event is a narrowing of
that). Basically a string that is the only component to put on a timeline..
regardless of whether it is an "event" or not. In logging terms, the
message parameter.
How that happens I don't care much I just don't want to be in a position
where recommending OT is the same as recommending a 3 headed hammer. We
need to be portable with dapper annotations.
|
@bensigelman not sure why github won't let me reply to your question about preference. If |
@adriancole the new github review thing has some weird rules about where you're allowed to reply to things. /shrug. I'll propose something later today. Maybe we'll leave |
@adriancole PTAL after the latest commit. TL;DR, the developer can now write
|
* @return a Map suitable for use with @link{io.opentracing.Span#log(Map)} | ||
* @see Logs#EVENT_KEY | ||
*/ | ||
public static Map<String, ?> event(String eventName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that these helpers do not act like builders... they only cover the case of a Map with a single entry, which is the common case @adriancole is concerned about. I was imagining that multi-entry maps can make use of the EVENT_KEY
constants... a full-on Builder is also an option but is overkill right now IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI singleton maps are in all recent versions of java Collections.singletonMap()
so if we have a stable event or message key defined, then you could just as well suggest
Collections.singletonMap(LogKeys.EVENT, "mary had a little lamb")
.
That is unless the goal is to eventually extend map with log view types, ex.
class Event extends MapAdapter<String, String> {
...
}
In that case, you might want to direct them to a file like this.
I did this in Denominator, mainly to allow one to use rare or undefined DNS rdata types. It works, mostly, but has some limitations. https://github.com/Netflix/denominator/blob/master/model/src/main/java/denominator/model/rdata/SRVData.java#L23
https://github.com/Netflix/denominator/blob/master/model/src/main/java/denominator/model/ResourceRecordSets.java#L559
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if interested in the map-view thing in documentation form https://github.com/Netflix/denominator/wiki/Model#record-data-rdata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear I wouldn't recommend doing this.. one of the reasons I am not for map-based things as unlike DNS, there isn't a registry of known types. So, the caller must need to know what the sender can accept (coupling caller code to the tracing system). That's why I prefer a "straight-pipe" for normal dapper-style annotations, as it is easier to map a string or string with timestamp to a map, vs encouraging a map which is more complex and not supported on many tracing systems.
Regardless, here's the impact of map-only expressed in finagle form.. Existing related code in finagle (noted for context.. particularly none of the annotations have anything to do with Map): sealed trait Annotation
object Annotation {
case object WireSend extends Annotation
--snip--
def record(record: Record): Unit = {
record.annotation match {
case tracing.Annotation.WireSend =>
annotate(record, thrift.Constants.WIRE_SEND)
--snip-- How you would make opentracing work in finagle today. // existing signature
protected def annotate(record: Record, value: String): Unit = {
spanMap.update(record.traceId) { span =>
// change the internals to have a map of opentracing spans, and use the log method and of course pass null for payload
span.log(record.timestamp.inMicroseconds, value, null))
}
} How you would make opentracing work in finagle when changed to a map. protected def annotate(record: Record, value: String): Unit = {
spanMap.update(record.traceId) { span =>
span.log(record.timestamp.inMicroseconds, Map("annotation or event" -> value)))
}
}
// change the recorder to unwrap that map to extract the "annotation or event" key |
Remove `io.opentracing.Logs` for now since it's not doing anything useful anymore.
@adriancole and I were chatting about something else (off github) and ended up discussing this PR for a bit. To try to summarize, the Anyway, I've just pushed something that removes the I've also put some "caution tape" around the Map-based API since I think the thrashing on this PR suggests that the precise details of the interface may change with more experience. Anyway, PTAL, anyone who cares. I'd still like to merge this soon. |
* Add a new log event to the Span, accepting an event name string and an optional structured payload argument. | ||
* Log key:value pairs to the Span with the current walltime timestamp. | ||
* | ||
* CAUTIONARY NOTE: not all Tracer implementations support key:value log fields end-to-end. Caveat emptor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<p><strong>CAUTIONARY NOTE:</strong>
* @param event the event value; often a stable identifier for a moment in the Span lifecycle | ||
* @return the Span, for chaining | ||
*/ | ||
Span log(long timestampMicroseconds, String event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
happy with this as it leaves the simple event in-tact, while letting folks experiment with more complex ones. Will be interesting to see how this is used |
@@ -127,18 +123,46 @@ public String getBaggageItem(String key) { | |||
} | |||
|
|||
@Override | |||
public final Span log(String message, /* @Nullable */ Object payload) { | |||
public final Span log(String event) { | |||
return log(System.nanoTime() / 1000, event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be System.currentTimeMillis() * 1000
instead.
nanoTime()
:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time. The value returned represents nanoseconds since some fixed but arbitrary origin time (perhaps in the future, so values may be negative). The same origin is used by all invocations of this method in an instance of a Java virtual machine; other virtual machine instances are likely to use a different origin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I noticed this the other day, too.
With OT-Java impls I've done (which need to be Java 7 compatible) we've done nanoTime() deltas for duration and millis*1000 for the start timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I replaced nano/1000
with millis*1000
... not really relevant to this PR, but such a small change I figured I'd include it.
(In AbstractSpan I was able to use Instant
which obviates the problem... though this points to additional problems with the whole AbstractSpan
concept. I would really like to get that whole artifact out of this particular github repo, but that's its own discussion.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - it just felt wrong leaving that in the code that could be considered "an example"
@adriancole assuming this LGTY, please update the approval bit in github since the new review process has a big red "changes requested" and I would feel dirty ignoring that. |
Thanks for the discussion, all... merging. |
@bensigelman @adriancole can we please squash commits before merging PRs. there's no need for the flurry of checkpoint commits in the mainline. |
@michaelsembwever I'll enable that option for the repo. I would like to continue to support the option of merging individual commits since sometimes that's what makes the most sense. |
Oh -- looks like those options are already enabled. Meh. @michaelsembwever if you feel strongly I can rewrite history and squash these commits down after the fact. |
Yes, cleaning up the commits as the final steps in a PR is fantastic. On 29 Sep 2016 2:56 AM, "bhs" notifications@github.com wrote:
|
I updated the C# PR and included a few .NET specific changes. I'd appreciate your feedback! see this comment for the new syntax. |
Per opentracing/opentracing.io#96
This is only a sketch... once we decide on a specific API I will adjust tests and dependent artifacts.