-
Notifications
You must be signed in to change notification settings - Fork 344
RFC / WIP: make "auto-finishing" optional and non-Tracer-specific #166
Conversation
... while also moving the refcounting to the util package and creating a cleaner path to something different should we want it.
@@ -36,7 +34,12 @@ public ThreadLocalActiveSpan activeSpan() { | |||
|
|||
@Override | |||
public ActiveSpan makeActive(Span span) { | |||
return new ThreadLocalActiveSpan(this, span, new AtomicInteger(1)); | |||
return new ThreadLocalActiveSpan(this, span, new AutoFinisher()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this AutoFinisher
default makes this PR less-breaking for existing 0.30.x instrumentation... if we made the default Observer null
, the PR would become more-breaking but perhaps "cleaner".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the default fallback should be this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this behavior should be controlled by a configuration parameter in ThreadLocalActiveSpanSource. Otherwise it seems to defeat the purpose of this PR, which was to give people a way to turn off refcounting / auto-finishing behavior.
Another thing that's unclear to me - those people who asked for no refcounting / auto-finish behavior, are they implementing a Tracer or doing instrumentation of some framework? Because if it's the latter and those frameworks are mixed with other instrumentation that expects auto-finish, then the only way to make this work is to have makeActive(span) == makeActive(span, NoopObserver)
, i.e. if someone wants auto-finish behavior it must be requested explicitly, not via the Source configuration (it will be a breaking change from 0.30)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a difficult decision - less breaking is easier in the short term, but doesn't truly decouple the active span lifecycle from the ref counting concern. Although it will impact all instrumentations that have used active spans, I think I would prefer to have the instrumentations explicitly request ref counting via setting the AutoFinisher
observer, else handle finishing themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the refcounting should/must be a per-instrumentation-site decision. When it's appropriate, it's great. When it's not, it's not.
those people who asked for no refcounting / auto-finish behavior, are they implementing a Tracer or doing instrumentation of some framework?
Both, I think... though I'm more concerned with the latter all things being equal (as it's a larger population of people).
We will need to document at the API level what it means when no observer is provided (i.e., if it implies no-auto-finish or yes-auto-finish). I guess my current proposal is that the default should be yes-auto-finish, but that is definitely debatable and is probably the most interesting choice to make in this entire PR.
Very interested in the opinions of the folks I've cc'd on this PR (or any others).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivantopo @objectiser @yurishkuro
To be clear, I am not proposing the current default because it makes migrations from 0.30 easier, I'm proposing it because I still think it's a sensible default. Though I'm getting the sense that others disagree?
The most common case for instrumentation (IMO) will still be a synchronous try()
block, and the most concise syntax should auto-finish()
the (Active)Span there IMO. There are other ways to accomplish this, of course... we could have other variants of startXYZ()
along with startActive()
and startManual()
, for instance.
But I did want to make it clear that my thinking here is that the refcounting is still a net-good default, but just something that can be opted out of in cases where it's the wrong thing.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @bhs that defaulting no-arg startActive()
to auto-finish/refcounting behavior is a better default for most users. I don't have any data to back that up since our use of Java is relatively limited. But according to this search, a number of opentracing-contrib integration libraries have been upgraded to 0.30 using default auto-finish mechanism, so apparently it is both useful and convenient to use in instrumentation. @pavolloffay - any comments on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @yurishkuro, thanks for the corrected search!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a difference between updating a version and using the active span api. There's also a difference between writing code based on discussions like these and production experience with them. I wouldn't jump to conclusions that everything is fine.
What I don't like is that the liability of auto-finishing and its implementation are not only still here, but also still the default. Any problems with this implementation (as noted laboriously in the prior issue and I won't repeat here) are not less likely now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro so far we haven't had any problems with refcounting in any instrumentation lib.
@mabn made a good comment #166 (comment). Postponing finish
due to refcounting is nice in async situations but these use cases are mostly rare. Important is also that it can cause weird timing results if continuation of server span is passed to an async background task (but I haven't had that problem so far).
@@ -79,6 +84,8 @@ | |||
*/ | |||
Continuation capture(); | |||
|
|||
Span wrapped(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that an ActiveSpan
should always wrap a Span
instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerbenson thanks for pointing this out... I meant to remove this when I changed ActiveSpan to be a subclass of Span. Though having slept on it I think there might be another (better) way: leave the class hierarchy as it was, then pass the actual Span
instance into the Observer. Calling finish()
on an ActiveSpan is a little weird since deactivate()
seems like it should never happen after finish()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I removed this unused method, btw)
|
||
/** | ||
* XXX: fix comment | ||
* {@link AutoFinisher} is an {@link ActiveSpan} wrapper that automatically {@link Span#finish()}es the underlying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could avoid conjugating method names, e.g. "wrapper that automatically calls finish()
on the underlying...`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No fun, I love conjugating method names...
* | ||
* <p> | ||
* Note that {@link AutoFinisher} works by counting the number of extant {@link ActiveSpan} or | ||
* {@link ActiveSpan.Continuation} references to the underlying {@link ActiveSpan} provided at construction time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
underlying Span? It says underlying ActiveSpan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search/replace bug.
* <p> | ||
* Use {@link AutoFinisher} like this: | ||
* <pre><code> | ||
* try (ActiveSpan span = new AutoFinisher(tracer.buildSpan("...").startActive())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example doesn't match the no-arg constructor AutoFinisher()
.
The trick with AutoFinisher is that there can be only one per Span, otherwise ref-counting will be broken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, syntax changed after writing that comment...
@@ -56,7 +56,7 @@ The common case starts an `ActiveSpan` that's automatically registered for intra | |||
```java | |||
io.opentracing.Tracer tracer = ...; | |||
... | |||
try (ActiveSpan activeSpan = tracer.buildSpan("someWork").startActive()) { | |||
try (ActiveSpan activeSpan = tracer.buildSpan("someWork").startActive(new AutoFinisher())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems the try()
would still call close()
and finish the span, whether the AutoFinisher is passed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I reverted to the old BaseSpan/Span/ActiveSpan class hierarchy)
In so doing, bring back the BaseSpan/Span/ActiveSpan relationships (albeit with SpanFinisher sitting between BaseSpan and Span).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having slept on this for a few nights, I think this is a positive step. If others agree, I will clean it up, add better docs+tests, and hopefully get it merged.
I also welcome utterly different approaches if they're out there. IMO the ActiveSpan semantics at master are useful, but they do seem to couple too many things together at once and nobody likes that.
cc @pavolloffay @yurishkuro @objectiser @tedsuo @adriancole @jakerobb @felixbarny @ivantopo @tylerbenson who might have opinions about this PR and/or opentracing/specification#80 which motivated it.
interface Observer { | ||
void onCapture(ActiveSpan captured, Continuation destination); | ||
void onActivate(Continuation source, ActiveSpan justActivated); | ||
void onDeactivate(ActiveSpan activeSpan, Finishable finisher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by factoring out the Finishable
interface from Span
proper, there's no longer a need to have ActiveSpan
inherit from Span
rather than BaseSpan
... we're also back to a world where you can't call .finish()
on an ActiveSpan
, and IMO that's a good thing.
I am uneasy about the precise signatures for this Observer
interface... it could be smaller/simpler and still satisfy the requirements here (to factor out refcounting).
@@ -36,7 +34,12 @@ public ThreadLocalActiveSpan activeSpan() { | |||
|
|||
@Override | |||
public ActiveSpan makeActive(Span span) { | |||
return new ThreadLocalActiveSpan(this, span, new AtomicInteger(1)); | |||
return new ThreadLocalActiveSpan(this, span, new AutoFinisher()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this behavior should be controlled by a configuration parameter in ThreadLocalActiveSpanSource. Otherwise it seems to defeat the purpose of this PR, which was to give people a way to turn off refcounting / auto-finishing behavior.
Another thing that's unclear to me - those people who asked for no refcounting / auto-finish behavior, are they implementing a Tracer or doing instrumentation of some framework? Because if it's the latter and those frameworks are mixed with other instrumentation that expects auto-finish, then the only way to make this work is to have makeActive(span) == makeActive(span, NoopObserver)
, i.e. if someone wants auto-finish behavior it must be requested explicitly, not via the Source configuration (it will be a breaking change from 0.30)
@@ -39,12 +38,18 @@ | |||
ActiveSpan activeSpan(); | |||
|
|||
/** | |||
* @see #makeActive(Span, ActiveSpan.Observer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ambiguous - is it a shortcut for makeActive(span, null)
or something else? Or perhaps implementation-dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to maintain backward compatibility of behaviour, then I think the javadoc comment for all of the methods that have an Observer variant should say that they represent the case of using the AutoFinisher
observer, e.g. SpanBuilder.startActive()
, ActiveSpanSource.makeActive(Span)
, Tracer.makeActive(Span)
.
We may also want to consider deprecating these methods, so that for the 1.0 API the Observer has to explicitly be specified, e.g. whether null
or AutoFinisher
.
/** | ||
* {@link Finishable} factors out the overloaded {@link #finish()} method(s). | ||
* | ||
* @see ActiveSpan.Observer#onDeactivate(ActiveSpan, Finishable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afterDeactivate
@@ -39,12 +38,18 @@ | |||
ActiveSpan activeSpan(); | |||
|
|||
/** | |||
* @see #makeActive(Span, ActiveSpan.Observer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to maintain backward compatibility of behaviour, then I think the javadoc comment for all of the methods that have an Observer variant should say that they represent the case of using the AutoFinisher
observer, e.g. SpanBuilder.startActive()
, ActiveSpanSource.makeActive(Span)
, Tracer.makeActive(Span)
.
We may also want to consider deprecating these methods, so that for the 1.0 API the Observer has to explicitly be specified, e.g. whether null
or AutoFinisher
.
There is some additional information I would like to share here: After several discussions we had internally about our tracer implementation and feedback from developers that are not even concerned with tracing, just good developers who strive for the best abstraction to solve the problem at hand, we came across this question: Why does a Span has to concern itself with whether it is "active" or not? This clicked on all of us and we asked ourselves whether such thing as an "ActiveSpan" should exist or not.. our conclusion was simple: there should be just a Span, whether it is InScope/Active/Current, that's a totally different concern that has nothing to do with the Span itself. Then, if there is no such thing as an ActiveSpan then Continuations go away as well and we realize that, at a bare minimum, what is actually needed to properly do in-process context propagation is something that allows you to set/get whatever Span is considered to be Active/Current/InScope. We are trying to come up with an API the is simple, clean and truly flexible and reusable in Kamon and to achieve that, these are the very minimal constructs that really have to be exposed. Making a Span Active/Current/InScope might return an object that wraps the Span, maybe also having a reference to whatever was Active/Current/InScope before so that it can be restored later, but there is no reason for this scoped object to be a Span, just something that wraps a Span. We had a simple utility to get the current context in Kamon 0.6.x series which we thought might be just too low level and that we could replace it with higher level abstractions like the ones provided by OT, but the reality is that even if we wanted to provide higher level abstractions, the basics have to be exposed. I'm sharing all of this because given the fact that OpenTracing aims to be the common denominator when it comes to tracing I guess it's in the best interest of the community to provide just the minimum APIs to stay flexible and achieve the task, while leaving the door open for more advance features to be built on top of the base APIs. I would recommend everyone involved in this PR to take a look at what Census (using gRPCs context), Brave and even the old Kamon are doing for in-context propagation. Its just simple, it works, it's proven and many things can be built on top of that. I know there has been quite an investment in ActiveSpanSource/ActiveSpan/Continuation and I actually liked these abstractions at the beginning but the more we exercise them, the more we realize that there is something that doesn't click and a simpler solution could solve the in-process context propagation challenge. Actually, a simpler solution already solved it for us before and in other tracing systems, there isn't a need to reinvent the wheel here. It is not my intention to ask you to throw away that work, but if we came to the conclusion that is in the best interest of our users to stick to a simpler solution then common sense dictates that we should share our point of view and reasoning on this issue, even if it will just be for the sake of sharing ideas and exposing a different angle to this concern. If there is any interest in discussing a "simpler" solution for in-process context propagation I would be glad to participate in the discussion, otherwise I'll still try to provide as much feedback and insights on this PR as possible. |
@ivantopo what you're describing is where OpenTracing API has been prior to 0.30, when it had no opinion about in-process context propagation. It was a major roadblock to instrumentation because it required some 3rd party solution (like jaeger-context) to actually make different frameworks work together. It is certainly possible to come up with context propagation API that is completely decoupled from tracing, since context propagation is a more general problem (cf. tracing plane by @JonathanMace). But in order to describe the shape of the computation (which is what OT is about), having separate propagation and Span APIs creates a lot of boiler-plate code for instrumentation. What I like about active span is that it's still an abstraction. Granted, it prescribes certain semantics, but it should be possible to implement ActiveSpanSource on top of any other existing context propagation mechanism if so desired (but I don't know if anyone tried). I do agree that mental complexity of ActiveSpanSource seems higher than a simple push/pop mechanics of |
@ivantopo getting the API right is way more important than worrying about existing instrumentation, so thank you for sharing your perspectives and those of your straw-polled team members. Re existing solutions: the reality is that a normal Java process today does not have a reliable way to access the current/active/whatever-you-want-to-call-it Span. As such, no current solution "works", as that ^^^ is the ultimate design goal here. This is not an endorsement of the current OT approach, but just a reminder of the actual goal we all share. For what it's worth, the API originally was more of a "Handle" approach with a As far as API layering is concerned, average users are not supposed to think about Continuations... that abstraction is in place for the purposes of power users and those building things like the OT-Java-compatible https://github.com/opentracing-contrib/java-concurrent/blob/master/README.md ... in that model, the capture() and activate() happen behind the scenes, as they should. I don't know... you're arguing for the basic building blocks. That is actually what the current API aims to provide... by explicitly modeling the back-and-forth handoff of context/continuation state and an active, write-able Span, higher-level and cognitively simpler abstractions can be built. If I had to propose a different design that's probably more along the lines of what you're arguing for, we would:
Is ^^^ more what you're hoping for? If not, please clarify either here or with some other design sketch RFC PR and link to same. To reiterate, I definitely want to get this right and am happy to put in the time to have 0.30 code migrated accordingly. |
@bhs I think my main concern is the fact that the The So minimal change to the current APIs, except the app now explicitly determines how active spans should be finished. |
I tested it with RxJava instrumentation. No issues :) |
Thinking about how important simplicity is in an API like this, I am inclined to agree with @objectiser that no Observer should be present by default. This allows us to clearly document the separate concerns, and to have the documented behavior of abstractions like spans also be their default behavior. I'm not saying that autofinishing couldn't be a better practical default (we have made use of a lot of it), but the bar should be high given how important clarity is, and I wonder if it actually makes the instrumentation more understandable by seeing the explicit addition of an AutoFinisher in the places where it is needed. But, before we split too many hairs on that I'd love it if we could get some feedback from tracer implementors who have expressed some difficulty with reference counting and context management occurring outside their tracer, I notice they haven't weighed in yet. |
@tedsuo I have only one reason to use |
Regular spans are perfectly fine to be closed explicitly , e.g. with try-catch block - when the start and finish happen in the same method. So auto-finishing is useful mostly when jumping between execution contexts (e.g. threads) where it's hard to detect when all If my understanding is correct then the hassle of adding |
@bhs saying that no current solution "works" is quite a broad and misleading statement. There are a number of projects succesfully tracing all sort of JVM applications since years ago, before OT and any of the proposed abstractions existed. The reason why there is no reliable way to access the current/active/whatever Span on the JVM doesn't have anything to do with the APIs that surfaced over the years, it has to do with the variety of execution models that are appearing lately with all the reactive, async and non-blocking wave that is covering the JVM. Those challenges are not going to get solved by OT in any way, nor by jaeger-context, tracing plane, gRPC context, my proposal or any other, there is no generic way to solve this problem for everything that runs on the JVM, those challenges can only be solved by developers writing instrumentation based on a deep understanding of how the toolkit they are instrumenting works under the hood so that the decision of when to propagate/activate/deactive can be taken appropriately, the APIs are just a means to that end. In-process context propagation only requires two things:
It doesn't matter if we are talking about instrumenting old boring servlets or reactive applications, these 2 operations are all we need to instrument. The only difference between these two types of applications is when should something set as active and for how long it should remain like that but all in all, the operations are the same: get and temporarily set. That is the very minimum API required, that is what many solutions out there boil down to and, IMHO, that's the bare minimum OT should expose. Moreover, that is the common denominator across many solutions that are working in production right now, including Jaeger, Brave, Census/gRPC and probably many more beyond my knowledge. This is our current API:
Naming aside, this is all we need. With this we instrument everything: Scala, Twitter and Scalaz Futures, Akka, Play Framework and more to come, and this just works (a simpler version of this has been working for 4+ years for thousands of our users). All sorts of challenges arise when working with these tools, the way of thinking is quite different, deferred work is common, having different threads pools and future callbacks hoping across them is the rule, every single interaction between actors is asynchronous and still, those challenges are being solved with this super simple API. The current approach with reference counting by default works nice in controlled environments, but it is a otherwise fragile approach for numerous reasons already exposed. The new approach in this PR is adding more API surface to opt-out of a feature that probably shouldn't have been in the scope of OT-api in the first place. With regards to the boilerplate related to having separate APIs, that's debatable. I see how having clean cut responsibilities between abstractions has to be paid with extra typing, but whether one line more here and there is "a lot" of boilerplate or not seems a very subjective, endless discussion and this is probably not the place to have it. I would like to hear opinions from the rest of the community and if there is enough interest I would be glad to propose an alternative PR exploring the constructs proposed above. Looking forward to comments. |
Exactly exactly, that is basically what I'm trying to say with my "no current solution works" comment. Of course people have been trying to trace JVM (and every other environment) since the dawn of time. My point is merely that no API for "cross-package propagation" is sufficient in practice for any given "Application Process X" until that API has instrumentation in place for X's dependencies, esp the ones pertaining to control flow. I really like most of what you said ^^^ and would absolutely welcome an alternative PR, FWIW. PS: I do want to reemphasize one more critique of the current OT API at |
@@ -33,6 +33,42 @@ | |||
* {@link ActiveSpan.Continuation}s, then re-{@link Continuation#activate()}d later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR tries to distance from refcounting. There's a line that still talks about that which might be something to remove in this PR https://github.com/bhs/opentracing-java/blob/91cf7e5e500f635b9b765df3a107e0de32ad5f0e/opentracing-api/src/main/java/io/opentracing/ActiveSpan.java#L108
@@ -68,7 +68,7 @@ The above is semantically equivalent to the more explicit try-finally version: | |||
```java | |||
io.opentracing.Tracer tracer = ...; | |||
... | |||
ActiveSpan activeSpan = tracer.buildSpan("someWork").startActive(); | |||
ActiveSpan activeSpan = tracer.buildSpan("someWork").startActive(new AutoFinisher()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinion: I prefer no ActiveSpan type vs adding more options to it. This is generally discussed in #80 so not repeating here
@@ -36,7 +34,12 @@ public ThreadLocalActiveSpan activeSpan() { | |||
|
|||
@Override | |||
public ActiveSpan makeActive(Span span) { | |||
return new ThreadLocalActiveSpan(this, span, new AtomicInteger(1)); | |||
return new ThreadLocalActiveSpan(this, span, new AutoFinisher()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a difference between updating a version and using the active span api. There's also a difference between writing code based on discussions like these and production experience with them. I wouldn't jump to conclusions that everything is fine.
What I don't like is that the liability of auto-finishing and its implementation are not only still here, but also still the default. Any problems with this implementation (as noted laboriously in the prior issue and I won't repeat here) are not less likely now!
The ActiveSpan api has a lot of weight, and even more now. Sometimes the cure is worse than the disease. The rest of this comment is on what @ivantopo proposed: I completely agree with @ivantopo that abstraction is important. I've done a lot of different instrumentation, and supported many different types of people doing instrumentation. We have all been bitten by prior apis that punt calls to close a scope by design (ex the old thread binder in brave leaked spans and was really hard to debug problems like this) I've gotten a lot of very good feedback about the new scoping apis which are essentially the same sketch as Ivan, and also the same as how census works. @jakerobb on openzipkin-contrib/brave-opentracing#43 listed some concrete place where he might not know when something finishes before another thing, and that due to it being a dead library having no way to "fix it". There are some concerns about having a reference to something, and what you prefer to have a reference to. I would prefer we use real scenarios like these to evaluate not just the chosen design, which ActiveSpan clearly is from insiders, but also evaluate alternative designs or the design of doing nothing. IOTW, we talk a lot and it is often really frustrating because there's nothing behind it. We talk in terms of what body of customers etc does X abstractly and rarely the style of "I am doing X" like jake says for example. The latter is far easier to pin down. TL;DR; If something is important enough to be a api this wide, it should be important enough to catalog actual things and alternatives tried but not taken, and how end users feel about them (from their own mouths!) |
@ivantopo sorry both for the slight delay and the brevity here – juggling a lot of things right now! You wrote:
Very important to note that the PR under review moves the refcounting out of the vendor impls... so that concern would go away, I think? My hope is that the refcounting is possible but can be implemented 100% outside of the OT core API artifact (and also 100% outside of the Tracer impl). I.e., I'd like it to be possible but never required, esp for instrumentation code. (At You also wrote:
To do async work, the following things happen, and in this order:
I would sleep better if the tracing API had a special affordance for step 1 above ( |
This conversation has gotten a bit over my head, but @adriancole asked me to chime in. I did the work in openzipkin-contrib/brave-opentracing#43 to update the brave-opentracing bridge to work with OT-API 0.30. It was fairly straightforward at first; I introduced a dependency on opentracing-util and wired in ThreadLocalActiveSpanSource to handle my reference counting. This worked fine for my use (my company has a suite of ~40 Java services, all instrumented with pure OT-API, backed by Brave+Zipkin). The initial feedback I got from that was that Brave already has its own context tracking mechanism, and that I should use that rather than run my own in parallel. There was also resistance to introducing a dependency on opentracing-util. So, I copied TLASS and adapted for my needs. I came up with this: https://github.com/jakerobb/brave-opentracing/blob/a7854242af256035800a31fc692571be7e5df9f2/src/main/java/brave/opentracing/BraveActiveSpanSource.java Note that BraveActiveSpanSource keeps a Map from SpanId to BraveActiveSpan, which among other things keeps a reference to Brave's SpanInScope, which Brave returns when you start a span, but doesn't expose at any other time. The SpanInScope is necessary in order to remove a span from current context, so I hold on to it in the map, and when I'm later told to close a span (and the refcount reaches zero), I pull it out of the map, close it, and purge from the map. What I have works, and we have it in production using an internal build of brave-ot. That said, keeping the map around exposes me to race conditions and memory leaks in a scenario where instrumentation is done improperly. I'd really like it if whatever mechanism OT-API eventually provides for context tracking (assuming that feature is retained) had some acknowledgement that there's a backing engine that needs to be kept in the loop, and that the backing engine probably has its own API which some instrumentations might use for which OT needs to be kept in the loop, and provided explicit hooks to enable that coordination. |
I feel it's time to give my opinion as the author of the java-spanmanager contrib library. I'm sorry if my comment has too much focus on java, probably due to my limited exposure to other OT variants. The discussion in this PR seems about making functionality optional that isn't wanted by all. It does so by adding on an Maybe we should revisit the core concepts and create separate API's for them that can be combined but each addressing their own concern? After all, that is all we need to be able to provide for instance thread pool implementations to automatically activate and deactivate spans in started threads. For those that wish auto-finishing or have new spans automatically become About the Feel free to disagree or try to convince me otherwise but this is my current opinion 😉 .. |
I actually like the API with interface ActiveSpanSource {
// no changes
ActiveSpan makeActive(Span span);
}
interface Tracer extends ActiveSpanSource{
// no changes
SpanBuilder buildSpan(String operationName);
}
interface Span extends AutoCloseable {
Span setTag(String key, String value);
// Creates "an inactive ActiveSpan" which can be "activated" and turned into regular ActiveSpan. It
// has only a single purpose - to carry some contextual information to another thread. Using
// "capture" is not mandatory
Continuation capture();
/**
* By default deactivates the span (if it was active) and finishes the span. The behaviour
* is defined by the lifecycle used to create the span and another lifecycle can be provided
* at span creation.
*/
@Override
void close();
// yes! it's back!
void finish();
}
interface SpanBuilder {
// starts span and makes it active in the current thread
Span startActive();
// allows to alter span's lifecycle management - e.g. how Span.close() behaves
Span startActive(SpanLifecycle lifecycle);
// no changes, starts inactive span
Span startManual();
SpanBuilder childOf(Span span);
}
// represents the scope of span activity. provides access to the span that is active in this scope
// via span(). closed ActiveSpan is no longer valid - it's an error to call span() on it
interface ActiveSpan extends Closeable {
// It no longer has Span methods! Instead - span get be obtained with span()
Span span();
/**
* By default deactivates current ActiveSpan (but does not finish the span!). The behaviour is
* defined by the lifecycle used to create the span and another lifecycle can be provided at
* span creation.
*/
@Override
void close();
}
interface Continuation {
ActiveSpan activate();
}
// interface allowing to slightly alter how span lifecycle behaves
interface SpanLifecycle {
void onClose(Span span);
} Some more details: https://gist.github.com/mabn/9246ebb03aa1534b33b98bd728158b11 The main changes:
btw - i'm referring to threads, but it should be valid for any kind of execution context supported by activeSpanSource |
@mabn I have been playing around with API variations, too. I like a lot of things about your proposal. A few thoughts:
Thoughts? |
I have a sketch of a different approach here: https://github.com/bhs/opentracing-java/pull/5/files Plot summary sans comments for brevity:
More complete thoughts can be found at https://github.com/bhs/opentracing-java/pull/5/files. Note that this PR ^^^ includes code that makes refcounting opt-in and decoupled from OT proper. |
@bhs Am I right that this latest sketch boils down to basically identical as a renamed SpanManager minus the deprecated method? |
Very similar, yes, I should have mentioned that. Only real divergences are the lack of a clear() method, and then of course the fact that Tracer directly integrates with the new API... On that note, I'm not satisfied myself with the fluency (or lack thereof) of startActive with a finish()-on-close-scope helper. |
@bhs I agree with your comments. When it comes to your PR return type of try (Scope scope = tracer.buildSpan("...").startActive()) {
scope.span().setTag(...);
scope.span().log(...);
} It would be great if try (Span span = tracer.buildSpan("...").startActive()) {
span.setTag(...);
span.log(...);
} But - it forces adding Basically I'd want to have the cake and eat it but I don't really see how. |
Yeah, this is basically how |
Hi all. @malafeev and I have started a repo for Java tracing examples: https://github.com/opentracing-contrib/java-examples We've just started, but the hope is that we can use it to illuminate the many edge cases we are concerned about, and fork it to present API choices we are seriously considering. Personally, I'm having a little trouble making such a weighty decision just on the basis of code snippets. Please contribute examples, and let me know if you think this approach has value. |
I decided to take advantage of the example-repo work to demonstrate the instrumentation differences between my proposal here and what's at You can see this new By the way, @objectiser pointed out something important in the other demonstration PR... IMO it's very difficult to cleanly handle his concern without introducing most of the complexity of what's in Curious to hear what others think. |
Hey everybody. Summarized the changes and testing results of Ben's latest approach (https://github.com/bhs/opentracing-java/pull/5/files) here https://gist.github.com/carlosalberto/28ba04ae3201d87f91eff924e7d427da That will get an overall idea for those who haven't been lately following the discussion, as well as for those interested in any further changes to this design - specially the And hopefully this will get things rolling again (feedback, for a start) :) |
@carlosalberto thank you for the above, a helpful and accurate summary of things! @objectiser @pavolloffay @mabn @sjoerdtalsma @ivantopo @jakerobb @yurishkuro @malafeev @adriancole: Current status given the research done so farHere's how I see things...
Note that there is not a debate about whether Tracer impls need to implement refcounting... that can def be factored out into a If there are no major revelations...My default at this point is to chart a path towards removing Moving towards a mergeable PRI would like to try to follow the spirit of @tedsou's process doc... most importantly, before merging we should verify sanity in plentiful OT-contrib repos as well as the most visible Tracer impls (e.g., Jaeger and Brave). RFCPlease chime in with dissenting opinions about ^^^. |
Thanks @carlosalberto for your additional PR, but I think @bhs statement about removing defer/Continuation for now would be best. Unless there is a compelling usecase for it at this stage, it may be better to move to a 1.0 of the API without it, and then possibly revisit for 2.0? |
I'm all cool with leaving |
* @return an {@link ActiveSpan} that encapsulates the given {@link Span} and any other | ||
* {@link ActiveSpanSource}-specific context (e.g., the MDC context map) | ||
*/ | ||
ActiveSpan makeActive(Span span); | ||
ActiveSpan makeActive(Span span, ActiveSpan.Observer observer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea (not saying it's a good one... or that I understand all the implications) for making this be more backwards compatible is to make the Observer
argument be a var arg. This would have the additional side effect of potentially allowing multiple Observer
s.
* | ||
* @see BaseSpan#context() | ||
*/ | ||
void finish(long finishMicros); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be a good time to consider separating the units from the value? Perhaps a signature like this:
void finish(long finishTime, TimeUnit units)
I feel having implied unit magnitude often results in confusion. In the past, I've had to write code that dynamically figures out the proper magnitude assuming an incorrectly reported scale (seconds vs milliseconds) all because the api was used incorrectly.
@carlosalberto which other tracers would you like feedback from? (ex obv mine is to please remove the api) If folks aren't speaking up either they aren't watching or ignoring. You might need to contact someone directly. A lot of open source work ends up meaning actively getting an answer from folks as day jobs are distracting and threads like this are time consuming. Ex. simplify and ask "How would you feel if we removed X for possible revisit 2.0" to any advertised OpenTracing tracer and/or as an issue on any Tracer you can find implemented on github (latter is not that many) |
To summarize a lot of discussion on this PR, related PRs, Gitter, OTSC calls, etc: the community seems comfortable removing the In terms of actual concrete next-steps, I think the basic shape of the ScopeManager POC is a good compromise. There are a few things I would like to address before sending out a real / non-WIP PR against OT-Java:
I would like to volunteer to write a PR that incorporates the above (along with anything else that people want to add to the mix). That said, my calendar and email collude to prevent me from writing any actual code during the weeks these days (maybe this will change someday; I'd like to say so, but that may be naive!). But I will definitely participate in any reviewing / shepherding if others have cycles to make progress on the above in the meantime. (cc @opentracing/otsc, @tylerbenson, @tedsuo) |
Closing this PR now that 0.31 is underway: #189 |
Per opentracing/specification#80, there are concerns about the existing
ActiveSpan
/ActiveSpanSource
concepts, what they imply forTracer
owners/maintainers, and what they imply about acceptable idioms forActiveSpan
instrumentation.This PR is not intended for merge as-is (at the very least it needs more testing and documentation), but is intended to illustrate some relatively small API changes that decouple
Tracer
s from the refcounting business.There are two problems I'm hoping to address here.
makeActive
aSpan
with theActiveSpanSource
without getting into the refcounting stuff ^^^.Again, this PR is best understood in the context of opentracing/specification#80. Comments welcome, but don't worry about nitpicks/etc as this is not meant to be merged in its current state.