Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

WIP: Add the notion of an "active" Span and io.opentracing.Scheduler #111

Closed
wants to merge 12 commits into from

Conversation

bhs
Copy link
Contributor

@bhs bhs commented Mar 19, 2017

This PR is a proposal to address opentracing/specification#23 for opentracing-java.

It's definitely not done (still needs better tests, better documentation, etc), but it is a sincere/credible effort to actually solve this problem at the core API level.

A few high-level notes:

  • This PR induces a major-version bump. That said, existing OpenTracing instrumentation will continue to work (and probably work a bit better, actually, since parents can now be inferred when they weren't previously known)
  • There's a strong effort to keep implementation details like MDC out of OpenTracing-Java's core API. I will send out a separate example PR showing how to build bindings between OT and MDC (for example) using this API
  • Please take a look at https://github.com/opentracing-contrib/java-spanmanager and https://github.com/uber/jaeger-client-java/tree/master/jaeger-context/src/main/java/com/uber/jaeger/context ... they both maintain explicit stacks, whereas the approach here is to make the stack implicit via linked Scheduler.Continuation instances (if there's a stack at all) (ed: @sjoerdtalsma set me straight... SpanManager doesn't require a stack, it just provides an impl that happens to have one)

I'll add some comments inline.

I'm definitely prepared for a full spectrum of responses to this... all opinions welcome!

PS: I am not really a Java programmer, so sorry about any transgressions here.

@@ -18,6 +18,82 @@ to facilitate unit-testing of OpenTracing Java instrumentation.

Packages are deployed to Maven Central under the `io.opentracing` group.

## Usage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was basically nothing in this README for users of OT-Java. Now there's something, though it's too heavily weighted towards span scheduling :-/

[GlobalTracer](https://github.com/opentracing-contrib/java-globaltracer) repository provides a helper for singleton
access to the `Tracer` as well as `ServiceLoader` support for OpenTracing Java implementations.

### Span Scheduling and the "active" Span
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REVIEWERS: the text from here to the end of the README summarizes the high order user-facing bits

@@ -29,8 +29,8 @@

<properties>
<main.basedir>${project.basedir}/..</main.basedir>
<main.java.version>1.6</main.java.version>
<main.signature.artifact>java16</main.signature.artifact>
<main.java.version>1.7</main.java.version>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted to use try-with-resources for the Scheduler.Continuation. Can someone with more JVM ecosystem experience chime in about how realistic this 1.7 requirement is? cc @pavolloffay @yurishkuro

Copy link
Member

@yurishkuro yurishkuro Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there have been explicit requests for 1.6 ¯\_(ツ)_/¯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree @yurishkuro , 1.6 is very important for many on-line system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, it is "strange" (http://www.oracle.com/technetwork/java/eol-135779.html). Even JDK 7 is "end of public updates".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to plumbr, almost 10% of their customers have used Java 1.6 in 2016.

I think a big milestone for us framework developers will be the release of Spring 5 this year which only supports Java 8+. So my hope is that within a year or two, it will be "mainstream" to drop support for Java 6 and maybe even 7 in libraries like OpenTracing. Especially since this is a API and not an implementation it is imho important to support as many Java versions as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People come to zipkins gitter all the time asking to make sure brave is 1.6 compatible. I'd say its still unfortunately a hard requirement for promoting adoption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. The 1.7 feature doesn't really change the fundamentals, it just allows for a friendlier syntax... so we could try to create both opentracing-api and opentracing-api-1.6 artifacts, with the former built on top of the latter "in some way." I'm waving my hands about the details, though... I can imagine it creating lots of compatibility/versioning headaches downstream.

* Like {@link References#CHILD_OF}, but inferred via {@link Scheduler#activeContext}, not explicitly
* specified by the programmer (and thus less certain).
*/
public static final String INFERRED_CHILD_OF = "inferred_child_of";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be nice for Tracer impls to know that the CHILD_OF is implicit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that this child reference is actually established internally by the Tracer itself, do we need a public constant? It's not like the API provides a way to query for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro that's a great point – I created this constant early on but see no use case for it with the way this version comes together. (not to say that implicit/inferred children shouldn't be labelled as such somehow... but agree that it doesn't need to be in the public API)

* {@link Continuation}; this activation context may be used to re-activate and deactivate the
* captured Span in that other asynchronous executor and/or thread.
*/
public interface Scheduler {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: this should be thought of as a different way to implement SpanManager from https://github.com/opentracing-contrib/java-spanmanager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered calling it "Propagator" or "Propagation" instead? Everyone calls it in-process propagation, yet the tile of this section is "Span Scheduling".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mabn I did actually think about that. There are two related problems...

  1. The name is already spoken for, albeit only for the MockTracer: https://github.com/opentracing/opentracing-java/blob/master/opentracing-mock/src/main/java/io/opentracing/mock/MockTracer.java#L86
  2. IMO "Propagation" could be easily misunderstood to refer to cross-process propagation (a la inject/extract). I like "Scheduler" because it's unambiguous in that regard, though it's perhaps misleading... after all, it doesn't actually schedule anything. /shrug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, I'm certainly open to naming suggestions here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about SpanManager to follow on from java-spanmanager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinator ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think Scheduler is quite confusing. It reminds me of java.util.concurrent.ScheduledExecutorService and org.springframework.scheduling.annotation.Scheduled which is quite misleading. I'd be okay with SpanManager or TraceContext (used in jaeger-context). To throw some more suggestions in the room: (Active)SpanProvider, (Active)SpanRegistry, SpanCaptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, Scheduler will die :)

I'll think about these proposals... all have their strengths and weaknesses.

Copy link
Contributor

@felixbarny felixbarny Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one more: InProcessSpanContextPropagator or some shorter variant of that. Although I wouldn't mind a long but descriptive name as long as it does not contain too many "enterprise buzzwords". It clearly describes what it does and uses OT's "ubiquitous language".

Copy link
Contributor

@felixbarny felixbarny Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this class does not really propagate the context but rather holds it. SpanContextHolder? Maybe there is a better verb than "to hold" for that. </spam>

* @see Scheduler
* @see Tracer#scheduler()
*/
public class ThreadLocalScheduler implements Scheduler {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to force all Tracers to always have some sort of Scheduler available, so it made sense (to me) to put this basic TLS one into opentracing-api. That said, it doesn't literally need to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only issue with it being here would be that there aren't currently any implementations in this module other than for tags. Since we are making a major version change on the library I don't feel that it would be unexpected as a dev to have to create/find a new component

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devinsba I see that point and don't disagree with it. That said, I'm not really concerned with the migration for existing users as much as ease-of-use for future users. I am still in favor of putting ThreadLocalScheduler in the core opentracing-api artifact, though perhaps (?) it should be in its own package. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could go to io.opentracing.util package with GlobalTracer, however as a separate artifact it seems too small. On the other hand API should be API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavolloffay placing it alongside GlobalTracer (wherever it eventually lands) seems reasonable, good suggestion

}

if (threadLocalActive.get() != this) {
// This should not happen; bail out.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is as good a place as any to say that multiple calls to finish are going to be more likely if people start doing complex async work using Scheduler. I am fine with those dropping data, but it'll be increasingly important not to corrupt things, leak memory, etc in those circumstances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add language to the spec around this? Something that has come up several times is the question of weather to guard against multiple calls to finish (or log/tag/etc occurring after finish) vs incur the overhead of introducing synchronizing points to ensure it never occurs. The need to add synchronization points, to me, is kind of like OT demanding design changes to the user's app in order to function, which feels like a violation of the "minimal impact" philosophy. It would be helpful to have these calls be allowed, rather than explicitly banned, and have what actually occurs in those states be enumerated.

Copy link
Contributor Author

@bhs bhs Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could get behind an edict / spec section that forbids crashes due to multiple finish() invocations on the same Span instance.

Specifying precisely what happens seems risky to me – OT is about describing the semantics of an application, and not so much about how those semantics are represented and/or "downcasted" into other monitoring systems. (E.g., an OT-to-metrics exporter shouldn't be considered defective for ignoring logging data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I agree we shouldn't dictate what trace consumers do with the data! What's more important to me is clarifying what harm the tracer may do to you process if you make these calls. Something like "It is safe to call any method after finish - including further calls to finish - but how tracers choose to interpret this data is not defined."

/**
* Return a new SpanBuilder for a Span with the given `operationName`.
*
* <p>If there is an active Span according to the {@link Tracer#scheduler()}'s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one comment is the point of this entire PR. It would make the life of application (and library) instrumentors so much better+easier+etc. IMO a very exciting improvement for OT-Java and OT in general.

@@ -20,113 +20,154 @@
*/
public interface Tracer {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indenting in this file was off, so I fixed it. Maybe I'll submit and merge that as a separate PR, as the diff here isn't nearly as large as it appears.

*
* @see Scheduler.Continuation#activate(boolean)
*/
Scheduler.Continuation startAndActivate(boolean finishOnDeactivate);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new, and maybe should take precedence over plain-old start() for most use cases.

@bhs
Copy link
Contributor Author

bhs commented Mar 19, 2017

You can see a spike of how this might be adapted to couple with MDC here: https://github.com/bhs/opentracing-java/pull/3/files

bhs added a commit that referenced this pull request Mar 19, 2017
Parts of it were 2-space rather than 4-space.

This led to noisy diffs in
#111
@bhs bhs mentioned this pull request Mar 19, 2017
bhs added a commit that referenced this pull request Mar 19, 2017
Parts of it were 2-space rather than 4-space.

This led to noisy diffs in
#111
@bhs bhs force-pushed the bhs/issue_23_repo branch from 4f948df to 4fbf4b6 Compare March 19, 2017 05:37
@@ -23,25 +23,37 @@
/**
* Return a new SpanBuilder for a Span with the given `operationName`.
*
* <p>If there is an active Span according to the {@link Tracer#scheduler()}'s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This semantic change is the one (huge) benefit afforded by all of the additional complexity in this PR. (I added this comment before, but reiterating since it got lost in the #112 rebase)

this.finishOnDeactivate = finishOnDeactivate;
toRestore = threadLocalActive.get();
threadLocalActive.set(this);
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably meant return span;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good 👀

btw, once we come to consensus about the API, I will push for good test coverage same.

threadLocalActive.set(toRestore);
}

private Continuation(Span span) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put constructor on top of all instance methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even private constructors? I will never understand Java conventions... :) but moving nevertheless.

@@ -125,7 +141,33 @@
/** Specify a timestamp of when the Span was started, represented in microseconds since epoch. */
SpanBuilder withStartTimestamp(long microseconds);

/** Returns the started Span. */
/**
* Returns a newly started and {@linkshort Scheduler.Continuation#activate(boolean) activated}
Copy link
Contributor

@felixbarny felixbarny Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no linkshort Javadoc tag. Either use link or linkplain, if 'activated' should be displayed in plain text rather than in code font.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this in some other PR. Will change.

public void deactivate() {
if (span != null && this.finishOnDeactivate) {
span.finish();
}
Copy link
Member

@yurishkuro yurishkuro Mar 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is meant to be used with auto-close via try(continuation). Then the following is also possible

try (continuation) {
    try (Span span = continuation.activate()) {
    }
}

which technically removes the need for auto-finish flag. But it doesn't look like we can use that due to java-6 req.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java 6 is going to ruin the party for everyone else.

Agree that your proposed syntax works, I just think it's wordy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why users should be affected by this. Even if the byte code is targeted at 1.6, when used within a Java 7 code base, try-with-resources is still possible when Closeable is implemented, as Closeable extends AutoCloseable in jdk 7. So using Java 7 in OT-API would just be syntactic sugar for us but would not make a difference for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny I am far from an expert in Java/JDK nuances (I wrote C++ for too many years, and now spend my time primarily writing a combination of Go and work-related email!).

That's great to hear about Closeable and AutoCloseable... I will see what I can do now that you've pointed that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed... back to 1.6, yet the try-with-resources works as intended for 1.7+. Thanks @felixbarny. 💯

* Tracer tracer = ...
*
* // Note: if there is an {@link Scheduler#active()} Span, it will be treated as the parent of workSpan.
* Span workSpan = tracer.buildSpan("DoWork")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: merge examples into one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's good reason to keep them separate to show how asChildOf works, but I did add a comment.

Span span = tracer.scheduler().active();
span.setTag( ... ); // etc, etc
}
}</pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This code block is formatted differently from the one on buildSpan

@@ -29,8 +29,8 @@

<properties>
<main.basedir>${project.basedir}/..</main.basedir>
<main.java.version>1.6</main.java.version>
<main.signature.artifact>java16</main.signature.artifact>
<main.java.version>1.7</main.java.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People come to zipkins gitter all the time asking to make sure brave is 1.6 compatible. I'd say its still unfortunately a hard requirement for promoting adoption

* @see Scheduler
* @see Tracer#scheduler()
*/
public class ThreadLocalScheduler implements Scheduler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only issue with it being here would be that there aren't currently any implementations in this module other than for tags. Since we are making a major version change on the library I don't feel that it would be unexpected as a dev to have to create/find a new component


public static class Continuation implements Scheduler.Continuation {
@Override
public Span activate(boolean finishOnDeactivate) { return null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it should be illegal to return null from Continuation.activate()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro this is an interesting point... the null case comes up when the captureActive happens in an execution context with no active Span. I don't want to force the Continuation to be null in that case since it may also involve state from other subsystems like MDC.

There are alternatives, though they would involve changes to the contract for captureActive – namely that a new Span must be created if none was active, though then we need an operation name and so forth.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are alternatives, though they would involve changes to the contract for captureActive – namely that a new Span must be created if none was active, though then we need an operation name and so forth.

Why not replace null with NoSpan (not the same as noop)? In most cases it will simplify the usage there is no need to plague the app/library code with null-checks.

It has the downside that you never know whether you actually have a real span or not, and that might require adding one extra function to the Span interface to know for sure, but imho that's a fair price to pay for a simpler and safer usage experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(let's move this discussion to #115... I cc'd you on a comment there)

`FunctionB`.** These are the steps:

1. In the method/function that *allocates* the closure/`Runnable`/`Future`/etc, call `Scheduler#captureActive()`
to obtain a `Scheduler.Continuation`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear, capturing the active span in this continuation detaches the span from the current execution context? Otherwise not clear how the span is detached/deactivated from the initial execution context, if it going to be finished in the subsequent execution context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth having two separate examples, one where span is being finished in different thread, but one for a fork/join situation where span is only being referenced by one or more child threads, but then still finished in the parent execution context/thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@objectiser

to be clear, capturing the active span in this continuation detaches the span from the current execution context? Otherwise not clear how the span is detached/deactivated from the initial execution context, if it going to be finished in the subsequent execution context?

Actually, no, that's not what I'd intended. The Span (or, really, Continuation) is detached when it's detached... not before and not after.

It's fine/acceptable/uncomplicated for a Span to be concurrently attached to multiple execution contexts (who knows, maybe a piece of its work is parallelized).

I think you're right about the examples for different control flow scenarios... I'll try to do that in my next update to the PR.

* {@link Continuation}; this activation context may be used to re-activate and deactivate the
* captured Span in that other asynchronous executor and/or thread.
*/
public interface Scheduler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filename does not match class/interface name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How weird -- not sure why the compiler isn't complaining. (I did some IntelliJ-mediated refactors)

Will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI, the issue here was a combination of IntelliJ opacity around when it saves files and the mild git sorcery I'm doing to maintain this branch in relation to the demo-caller branch in a separate PR)

* Make the Span (and other execution context) encapsulated by this Continuation active and return it.
*
* <p>
* NOTE: It is an error to call activate() more than once on a single Continuation instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the error defined? Should be some exception thrown?

/**
* A shorthand for <pre>{@code
* this.capture(this.active())
* }</pre>@code{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@code{} leftover?

* Span active = this.active();
* if (active == null) return null;
* return active.context();
* }</pre>@code{}
Copy link
Member

@pavolloffay pavolloffay Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@code{} leftover?

* @see Scheduler#capture(Span)
* @return the newly-activated Span
*/
Span activate(boolean finishOnDeactivate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span.finish() is not idempotent. This probably can lead to some errors.

* @see Tracer#scheduler()
*/
public class ThreadLocalScheduler implements Scheduler {
ThreadLocal<Continuation> threadLocalActive = new ThreadLocal<Continuation>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and final!)

* @see Scheduler
* @see Tracer#scheduler()
*/
public class ThreadLocalScheduler implements Scheduler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could go to io.opentracing.util package with GlobalTracer, however as a separate artifact it seems too small. On the other hand API should be API.


public MockTracer() {
this(Propagator.PRINTER);
}

public MockTracer(Scheduler scheduler) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be also MockTracer(Scheduler scheduler, Propagator propagator)

@bensigelman
Copy link

Meta request (probably a bit late): please only review this PR in terms of interfaces / signatures / etc: it's definitely not ready for merge (bugs, no tests, lint issues).

Once we are all ok with the API I will do a deep pass on the implementation and code quality.

Thanks!!

@felixbarny
Copy link
Contributor

You should prepend "WIP:" to the title of the pull request then.

@bhs bhs changed the title Add the notion of an "active" Span and io.opentracing.Scheduler WIP: Add the notion of an "active" Span and io.opentracing.Scheduler Mar 23, 2017
@bhs
Copy link
Contributor Author

bhs commented Mar 23, 2017

@felixbarny good idea, done.

(In my defense, I did write

It's definitely not done (still needs better tests, better documentation, etc), but it is a sincere/credible effort to actually solve this problem at the core API level.

in the PR description)

@pavolloffay
Copy link
Member

Meta request (probably a bit late): please only review this PR in terms of interfaces / signatures / etc: it's definitely not ready for merge (bugs, no tests, lint issues).

I actually like that this POC compiles/works at least for MockTracer so I can test in real examples/framework integrations.

@pavolloffay
Copy link
Member

Current change introduce:

  • SpanBuilder sb = tracer.buildSpan(name) - always adds parent if there is any, but does not tell me anything if I was expecting parent span and there is not.
  • SpanBuilder.asRoot() - removes all references (hack to remove parent == force to start a new trace)
  • SpanBuilder.startAndActivate(finishOnDeactivate) - adds started span to the scheduler, need to be called explicitly.

... we are changing the semantics of Tracer.buildSpan but SpanBuilder.start stays as it is. It does not add created span to the scheduler.

Wouldn't it be better to add following methods:

ScheduledSpanBuilder ssb = tracer.buildScheduledSpan(name, finishOnDeactivate);  // could tell me that there is no parent span, but probably I would like to start new trace anyway or throw an exception?
Scheduler.Continuation cont = ssb.start();

@bhs
Copy link
Contributor Author

bhs commented Mar 23, 2017

@pavolloffay

I actually like that this POC compiles/works at least for MockTracer so I can test in real examples/framework integrations.

Yeah, that makes sense... The fact that it compiles is because I also have been testing things in various primitive ways. I just don't want to spend people's valuable time reviewing the implementation details for correctness/lintyness/etc.

@bhs
Copy link
Contributor Author

bhs commented Mar 24, 2017

@pavolloffay

Wouldn't it be better to add following methods:

ScheduledSpanBuilder ssb = tracer.buildScheduledSpan(name, finishOnDeactivate);  // could tell me that there is no parent span, but probably I would like to start new trace anyway or throw an exception?
Scheduler.Continuation cont = ssb.start();

IMO we should assume the programmer wants to inherit from the parent (if there is one) 99% of the time. The API could be even more opinionated and require the use of the Scheduler (or ActiveSpanHolder, whatever), but that would make things like log translators really awkward (since the timestamps would be in the past, etc).

I am prototyping a few other permutations here... I'd like to get rid of finishOnDeactivate without forcing programmers to call finish() before deactivate()ing.

@objectiser
Copy link
Contributor

Apologies for going back to basics, but was considering these changes from the point of view of a simple single threaded application.

In this scenario, the OT instrumentation now needs to obtain Continuations and activate them, just to be able to record the current 'active' span, so that any subsequent instrumentation (e.g. in a framework integration) can obtain a reference to the parent span and establish a childOf reference.

For this simple case, shouldn't just using the existing API be enough - i.e. start()/finish() - internally working with the Scheduler (or whatever it is called) to implicitly manage the active span and establish the childOf relationship?

If that was the case - then the only extension to the current OT API would be to specifically deal with multithreaded scenarios.

bhs added a commit to bhs/opentracing-java that referenced this pull request Mar 28, 2017
This builds on opentracing#111
and makes some improvements and changes.

There are a few important changes from the approach in that previous PR
(which I will leave around for comparison):

1) There is no more activate()-time parameter about finish() behavior.
   Instead, ActiveSpanHolder.Continuation has internal refcounting. This
   leads to more fluent code and fewer bugs: either from forgetting to
   finish(), or from not-knowing what to pass to activate(), especially
   in Runnable/Callable wrappers (which was a real problem in my
   explorations with "real" codebases).
2) Per the above, ActiveSpanHolder and ActiveSpanHolder.Continuation are
   abstract classes rather than pure interfaces, and thus the
   refcounting is actually part of OT-Java rather than OT-Java impls.
3) I also addressed some naming and minor semantic concerns from PR opentracing#111
@bhs
Copy link
Contributor Author

bhs commented Mar 28, 2017

@objectiser

For this simple case, shouldn't just using the existing API be enough - i.e. start()/finish() - internally working with the Scheduler (or whatever it is called) to implicitly manage the active span and establish the childOf relationship?

I do think it's important to provide an API that creates Spans but doesn't meddle with the Scheduler / ActiveSpanHolder / whatever. For backwards-compatibility, I thought perhaps it would be best to make that API just plain-old start+finish.

From experimentation I did in app code, it seemed advantageous to return a Continuation (rather than a Span when the Scheduler / ActiveSpanHolder is involved, btw. Saved some TLS lookups, anyway.

You can also see #115 for another iteration on all of this stuff.

@malafeev
Copy link

Will this PR solve next puzzle?
Thread 1: create client (e.g. Async MongoDB driver) and propagate client to Thread 2
Thread 2: client send request (insert into mongo collection)
Thread 3: Intercept send request. Here I create span, but how I can access parent span from Thread 1 (or Thread 2? or both?) to create chain?
Thread 4: intercept completion request. Finish span from Thread 3.

@bhs
Copy link
Contributor Author

bhs commented Mar 28, 2017

@malafeev that depends on (a) how Mongo passes a context object from Thread 2 to Thread 3, and (b) whether that mechanism supports ActiveSpanHolder (from #115) or Scheduler (from this PR, #111).

If a framework or subsystem like Mongo wants to switch threads in a way that cannot be observed, there's not much we (or anyone) can do to follow along. But if it supports, say, MDC... then we would be able to track via an MDC integration with ActiveSpanHolder/Scheduler.

Make sense?

@malafeev
Copy link

@bhs Thread 3 is a thread for intercepting of all send requests. It lives until application lives.
There is only one way to pass holder to Thread 3 - when in Thread 1 client is created.
So I think Thread 2 will be always missing.

@bhs
Copy link
Contributor Author

bhs commented Mar 28, 2017

@malafeev it seems to me that you're making a lot of assumptions about the way everything fits together... if there is indeed no way to pass state from Thread 2 to Thread 3 (via MDC, via a context object, etc) then indeed there would be no way to instrument what you describe. I don't consider that an indicator about the approach here per se, as in a situation like that it's almost provably impossible to make this work without making lots of untenable assumptions about data flow within the process.

I don't mean to be dismissive, btw, I just think it's important to "know when to fold 'em" (or however that expression goes). Do you have a solution in mind for the scenario you outlined? I would imagine that the best thing would be to change the mongo interceptor to capture a Continuation when the request is submitted (thread 2 in your example), then activate it prior to invoking the inject code in thread 3, but I'm not sure if I'm "allowed" to change the mongo interceptor code.

@malafeev
Copy link

thanks @bhs. I think to wrap Mongo driver and as you described to capture Continuation and activate it before send (in Thread 2) and inject into interceptor.
I just wanted to check if there is a way to avoid wrapping driver by utilising only interceptor.

bhs added a commit to bhs/opentracing-java that referenced this pull request Apr 7, 2017
This builds on opentracing#111
and makes some improvements and changes.

There are a few important changes from the approach in that previous PR
(which I will leave around for comparison):

1) There is no more activate()-time parameter about finish() behavior.
   Instead, ActiveSpanHolder.Continuation has internal refcounting. This
   leads to more fluent code and fewer bugs: either from forgetting to
   finish(), or from not-knowing what to pass to activate(), especially
   in Runnable/Callable wrappers (which was a real problem in my
   explorations with "real" codebases).
2) Per the above, ActiveSpanHolder and ActiveSpanHolder.Continuation are
   abstract classes rather than pure interfaces, and thus the
   refcounting is actually part of OT-Java rather than OT-Java impls.
3) I also addressed some naming and minor semantic concerns from PR opentracing#111
bhs added a commit to bhs/opentracing-java that referenced this pull request Apr 15, 2017
This builds on opentracing#111
and makes some improvements and changes.

There are a few important changes from the approach in that previous PR
(which I will leave around for comparison):

1) There is no more activate()-time parameter about finish() behavior.
   Instead, ActiveSpanHolder.Continuation has internal refcounting. This
   leads to more fluent code and fewer bugs: either from forgetting to
   finish(), or from not-knowing what to pass to activate(), especially
   in Runnable/Callable wrappers (which was a real problem in my
   explorations with "real" codebases).
2) Per the above, ActiveSpanHolder and ActiveSpanHolder.Continuation are
   abstract classes rather than pure interfaces, and thus the
   refcounting is actually part of OT-Java rather than OT-Java impls.
3) I also addressed some naming and minor semantic concerns from PR opentracing#111
bhs added a commit to bhs/opentracing-java that referenced this pull request Apr 15, 2017
This builds on opentracing#111
and makes some improvements and changes.

There are a few important changes from the approach in that previous PR
(which I will leave around for comparison):

1) There is no more activate()-time parameter about finish() behavior.
   Instead, ActiveSpanHolder.Continuation has internal refcounting. This
   leads to more fluent code and fewer bugs: either from forgetting to
   finish(), or from not-knowing what to pass to activate(), especially
   in Runnable/Callable wrappers (which was a real problem in my
   explorations with "real" codebases).
2) Per the above, ActiveSpanHolder and ActiveSpanHolder.Continuation are
   abstract classes rather than pure interfaces, and thus the
   refcounting is actually part of OT-Java rather than OT-Java impls.
3) I also addressed some naming and minor semantic concerns from PR opentracing#111
bhs added a commit to bhs/opentracing-java that referenced this pull request Apr 16, 2017
This builds on opentracing#111
and makes some improvements and changes.

There are a few important changes from the approach in that previous PR
(which I will leave around for comparison):

1) There is no more activate()-time parameter about finish() behavior.
   Instead, ActiveSpanHolder.Continuation has internal refcounting. This
   leads to more fluent code and fewer bugs: either from forgetting to
   finish(), or from not-knowing what to pass to activate(), especially
   in Runnable/Callable wrappers (which was a real problem in my
   explorations with "real" codebases).
2) Per the above, ActiveSpanHolder and ActiveSpanHolder.Continuation are
   abstract classes rather than pure interfaces, and thus the
   refcounting is actually part of OT-Java rather than OT-Java impls.
3) I also addressed some naming and minor semantic concerns from PR opentracing#111
@bhs
Copy link
Contributor Author

bhs commented Apr 26, 2017

(Closing to remove ambiguity about whether this or #115 is more likely to merge... def #115!)

@bhs bhs closed this Apr 26, 2017
bhs added a commit that referenced this pull request May 11, 2017
This builds on #111
and makes some improvements and changes.

There are a few important changes from the approach in that previous PR
(which I will leave around for comparison):

1) There is no more activate()-time parameter about finish() behavior.
   Instead, ActiveSpanHolder.Continuation has internal refcounting. This
   leads to more fluent code and fewer bugs: either from forgetting to
   finish(), or from not-knowing what to pass to activate(), especially
   in Runnable/Callable wrappers (which was a real problem in my
   explorations with "real" codebases).
2) Per the above, ActiveSpanHolder and ActiveSpanHolder.Continuation are
   abstract classes rather than pure interfaces, and thus the
   refcounting is actually part of OT-Java rather than OT-Java impls.
3) I also addressed some naming and minor semantic concerns from PR #111
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.