-
Notifications
You must be signed in to change notification settings - Fork 854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes Export methods async #1428
Conversation
|
...rters/inmemory/src/main/java/io/opentelemetry/exporters/inmemory/InMemoryMetricExporter.java
Outdated
Show resolved
Hide resolved
...s/inmemory/src/test/java/io/opentelemetry/exporters/inmemory/InMemoryMetricExporterTest.java
Outdated
Show resolved
Hide resolved
exporters/logging/src/main/java/io/opentelemetry/exporters/logging/LoggingMetricExporter.java
Outdated
Show resolved
Hide resolved
exporters/logging/src/main/java/io/opentelemetry/exporters/logging/LoggingSpanExporter.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #1428 +/- ##
============================================
+ Coverage 86.56% 86.64% +0.07%
- Complexity 1367 1378 +11
============================================
Files 162 161 -1
Lines 5231 5292 +61
Branches 490 503 +13
============================================
+ Hits 4528 4585 +57
Misses 524 524
- Partials 179 183 +4
Continue to review full report at Codecov.
|
I honestly don't see how this PR improves anything for anyone. This makes a breaking change to the API for exporters, but with no actual improvement to any functionality. And, the introduction of a guava dependency on a number of exporters seems like a very large downside. Can you explain what your end-goal is with making this change? Just adding "async"-ness for the sake of adding it doesn't seem like a change that has any value for any actual exporter author or end user. |
@jkwatson We had a detailed discussion for this PR’s motivation in #1422. I asked you if you wanted me to raise a PR. In summary, this PR will assist exporters from blocking a thread in order to return a status. The PR introduces an API change now, while we can still get away with API changes. There is no implementation benefit right now, but the opportunity is there in the future. The Go and .NET APIs also avoid blocking. |
I guess I had expected there to be some consumption of the Future/Callback in the SDK itself, so that there might actually be a benefit to this change. As things are today, nothing stops an exporter from doing all the work async and returning one of the enum values immediately. So, I guess my question still stands: how do you see the SDK actually using this future? The Future API as it is also doesn't stop an exporter from making the whole call block and just returning an Future at the end of the blocking process. In fact, I believe that's what you implemented with this PR. If we're going to make this change, let's make it meaningful, and not just pro-forma. Does that make sense? |
By returning a reply for export, whether async or not, the SDK can control whether to release memory for metric and span data. If the export reply remains synchronous then an async exporter must also decide whether to release any buffered data as a result of network failure. Do we want this memory management logic in both places? |
I get all that. What I don't want, though, is to have this PR land, and then not have the next steps implemented, leaving us with a half-implemented idea. Would you be willing to put together a PR that demonstrates all of this, with changes in the SDK that take advantage of this change? If we like how it all looks, we can circle back to this PR and get it merged as the first step in that larger change. Also, if you're interested in taking on this scope of work, it would be great to have you join our weekly SIG meeting where we can talk through these plans in real time. Sound good? |
Unfortunately, having spent a number of hours already, and due to other work commitments, I’m unable to justify any advancement of this PR. Feel free to close this PR and it’s associated issue. |
Hmm. I'm sorry to hear that. I would really like to get working on modernizing the codebase (as much as we can given the java 7 requirement). I think these ideas will mesh well with other work-in-progress like #1419 and #1418 . We don't have a good story on export error reporting at the moment, leaving it up to the exporter authors to handle that on their own. I think that I'd personally prefer a callback, given the cruftiness of the |
I don’t see callbacks as being contemporary. If you like, I can spend some time over the weekend to look at dropping the Guava dependency and convert the export status enum to represent an async value. This should be straightforward. If there’s no appetite for that, I understand. Sorry that I can’t spend more time than this right now. |
I'm not sure what you mean by "convert the export status enum to represent an async value. " Can you elaborate? I don't think callbacks are up-to-date, or contemporary, but they don't tie the APIs to what is definitely an antiquated API. At least with callbacks, we completely control the API, soup to nuts. @anuraaga What do you think? |
Perhaps you guys can save me some time by sketching out the interfaces you’d like to see. Personally, I still see Future as being correct given that it describes the behaviour we are wanting, it isn’t deprecated and it forms the interface of the Java 8 CompletableFuture. Runtime checks can be made for handling specific types of Future. Btw I would like to learn more about why Java 8 isn’t the baseline. It’s causing a bit of a headache here. Have you got a ref to the decision making around this? |
@jkwatson I'm personally OK with either a callback or a custom implementation of public interface AsyncCallback<T> {
void onComplete(T result);
void onError(Throwable t);
}
public void export(Collection<SpanData> spans) {
export(spans, AsyncCallback.NO_OP);
}
public void export(Collection<SpanData> spans, AsyncCallback<ResultCode> callback); or public io.opentelemetry.api.Future export(Collection<SpanData> spans); @huntc Java 7 is mostly for Android where it's still common to target older OSes for device compatibility. Not sure about this SDK repo, but the instrumentation repo has some discussion here open-telemetry/opentelemetry-java-instrumentation#599 |
Thanks. So, given the decision to drop support for Java 7 let’s sit on this PR until we can use CompletableFuture. Sound good? I’m also assuming that they’ll be no GM release given the decision to migrate from Java 7. |
@huntc The decision is to drop support for Java 8 in most instrumentation, but not ones that are common in Android. But the SDK will have to stick to Java 7 regardless, or that would effectively require all instrumentation to be Java 8. |
Hi @jkwatson and @anuraaga, I've been playing around with an approach that should address all of the concerns to-date:
So far, I've just coded-up a Here's the /**
* The implementation of Export operations are often asynchronous in nature, hence the need to
* convey a result at a later time. CompletableResultCode facilitates this.
*
* <p>This class models JDK 8's CompletableFuture to afford migration should Open Telemetry's SDK
* select JDK 8 or greater as a baseline, and also to offer familiarity to developers.
*/
public class CompletableResultCode {
public CompletableResultCode() {}
private volatile boolean succeeded = false;
private boolean completed = false;
private Runnable action = null;
private Executor actionExecutor = null;
/** The export operation finished successfully. */
public synchronized void success() {
if (!completed) {
if (!succeeded) {
succeeded = true;
}
if (action != null) {
actionExecutor.execute(action);
}
completed = true;
}
}
/** The export operation finished with an error. */
public synchronized void failure() {
if (!completed) {
if (action != null) {
actionExecutor.execute(action);
}
completed = true;
}
}
/**
* Obtain the current state of completion. Generally call once completion is achieved via the
* thenRun method.
*
* @return the current state of completion
*/
public boolean isSuccess() {
return succeeded;
}
/**
* Perform an action on completion. Actions are guaranteed to be called only once.
*
* <p>There should only be one action for this class instance.
*
* @param action the action to perform
* @param executor the executor to run the action on
*/
public synchronized void thenRunAsync(Runnable action, Executor executor) {
if (this.action == null) {
if (completed) {
executor.execute(action);
}
this.action = action;
this.actionExecutor = executor;
}
}
} ... which is then used by /**
* Exports the collection of given {@link MetricData}.
*
* @param metrics the collection of {@link MetricData} to be exported.
* @return the result of the export, which is often an asynchronous operation.
* @since 0.1.0
*/
CompletableResultCode export(Collection<MetricData> metrics);
/**
* Exports the collection of {@link MetricData} that have not yet been exported.
*
* @return the result of the flush, which is often an asynchronous operation.
* @since 0.4.0
*/
CompletableResultCode flush(); The final CompletableResultCode result =
internalState.getMetricExporter().export(Collections.unmodifiableList(metricsList));
result.thenRunAsync(
new Runnable() {
@Override
public void run() {
if (!result.isSuccess()) {
logger.log(Level.WARNING, "Metric Exporter failed");
}
}
},
exportExecutor); My approach bears similarity to Java 8's Thoughts? |
Hey @huntc
Yes (at least we usually try to go that way :) )
I see the Disruptor processor has been deemed as a thing to avoid (for some cases, I'm suspecting), but what about using On the .Net/Go mention, I'd definitely put Go out of the comparison as it's a different animal IMHO. But I'm curious about the .Net take on this and will check it out. Also, I'd love to hear @bogdandrutu's opinion, as he was the one originally designing the processor/exporter layer, and he might have something to say. EDIT: Trying to go over the different comments, it seems mostly cancelation is why this is needed? Would love to see also a code example that makes use of the new proposed API. |
I'm a fan of the simple span processor because there's no thread pool other than any required by the exporter(s). No thread pool == more efficient memory usage. Threads are expensive. The downside of the simple span processor is that if an exporter blocks, so will my program. Exporters should thus never block their calling threads. Again, getting the method signature right for the exporter is my primary concern. The current design fails to convey that exporter operations are long-lived. Types are important. |
Oh, that's very interesting. Sounds reasonable, yes, and worth trying out.
True, but I'd also love to see this change validated with some actual prototype that takes full advantage. (Hope @bogdandrutu can chime in real soon ;) ) |
I'm unsure what you mean by "advantage". It's more a question of correctness. If you're looking for an async exporter prototype though then perhaps the following test will be illustrative: opentelemetry-java/sdk/src/test/java/io/opentelemetry/sdk/trace/export/BatchSpanProcessorTest.java Line 314 in b7cfcab
Note that prior to this PR, canceling exports via interrupting them is unsafe. So, perhaps that's another good reason then. :-) But again, my motivation here is mostly around the subject of correctness. |
exporters/otlp/src/main/java/io/opentelemetry/exporters/otlp/OtlpGrpcMetricExporter.java
Show resolved
Hide resolved
ResultCode currentResultCode = SUCCESS; | ||
public CompletableResultCode flush() { | ||
final CompletableResultCode compositeResultCode = new CompletableResultCode(); | ||
final CountDownLatch completionsToProcess = new CountDownLatch(spanExporters.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not waiting on it, may as well use AtomicInteger
to clarify that it's just for counting.
spanExporter.export(spans); | ||
} catch (Throwable e) { | ||
logger.log(Level.WARNING, "Exception thrown by the export.", e); | ||
if (exportAvailable.compareAndSet(true, false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SimpleSpanProcessor
I think I'd expect it to be simple and just call export in all cases (if there is a backup in the exporter, that is a reason to use a more complicated span processor like batch span processor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would then require exporters to become re-entrant. My assumption at present, as per the existing blocking API, is that only one export is called at a time (which is probably something we should document - it can simplify the authoring of exporters greatly - wdyt?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "re-entrant" here. Do you just mean having more than one export running at a time, or are you actuallyreferring to sending the same spans more than once and having the exporter be able to handle de-duping them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meanwhile, I've added some API docs to clearly state the position on re-entrancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "re-entrant" here. Do you just mean having more than one export running at a time, or are you actuallyreferring to sending the same spans more than once and having the exporter be able to handle de-duping them?
I was referring to the number of export
or flush
calls in flight at one time, which is what this code is associated with. We only want one of each of these calls in flight. In fact, we shouldn't really call export
and flush
at the same time either. Keeps the programming model simpler for exporters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, the IntervalMetricReader
has no such back-pressure mechanism, and never did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should I re-instate the ability to only call export
one at a time for the IntervalMetricReader
and BatchSpanProcessor
, and have them drop any new telemetry in the case where an export
is in-flight? We can leave the SimpleSpanProcessor
as-is though i.e. without the "in-flight" check. The SimpleSpanProcessor
will be perfect for those more sophisticated exporters, such as what can be written with Akka-streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now pushed a commit with updated documentation for the exporters, and the reinstatement of backpressure handling for the IntervalMetricReader
and BatchSpanProcessor
.
BTW Is there an equivalent of SimpleSpanProcessor
for metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics have a somewhat different pipeline structure. The closest equivalent is probably the IntervalMetricReader
class for push-based exporters. Pull-based, of course, manage all of that themselves.
FYI All that remains is for me to write a test for the |
exporters/otlp/src/main/java/io/opentelemetry/exporters/otlp/OtlpGrpcMetricExporter.java
Show resolved
Hide resolved
@carlosalberto Can we please have some resolution on this? I'll happily complete the PR over the weekend, but not if it isn't destined for greatness. :-) |
I'll put this on the agenda for tomorrow's SIG meeting, so it doesn't get lost. |
Thanks. Any resolution? |
The decision (which is not a resolution, unfortunately) is that Carlos is going to find some time for a deeper dive into this PR. Meanwhile, the discussion on the original issue (#1422) has some further thoughts from other folks (@Oberon00 in particular) |
@carlosalberto Is it possible to get a commitment to that deep dive over the next few days? I understand you're busy, but this has had many weeks for a closer look and I don't think it's fair to keep it stuck. |
Let's go with it. 👍 |
@huntc We're a go for this. Can you get it rebased? Thanks so much for your patience. You work is appreciated, and sorry that it took a long time to get this landed. |
This commit recognises that the export and flush methods of span and trace exporters can be, and often are, implemented with long-lived operations over networks. We therefore see that these method return types are represented using Java's Future type to account for this common behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the patience! Just nits but LGTM
for (Handler handler : logger.getHandlers()) { | ||
try { | ||
handler.flush(); | ||
} catch (Throwable t) { | ||
resultCode = ResultCode.FAILURE; | ||
resultCode.fail(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we early return here? Similar comment elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't matter in this particular case, since we might want to flush all the handlers, even if one fails. The call to succeed() below is a no-op if it's been previously failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a return to make it clear but, as John mentioned, it makes no diff.
import javax.annotation.concurrent.GuardedBy; | ||
|
||
/** | ||
* The implementation of Export operations are often asynchronous in nature, hence the need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be second sentence, first sentences should describe the class. e.g. A result which will be completed at a later time asynchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentences swapped, although my personal preference is to lead with the rationale as the type name is good enough.
* select JDK 8 or greater as a baseline, and also to offer familiarity to developers. | ||
*/ | ||
public class CompletableResultCode { | ||
/** A convenience for declaring success. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns a {@link CompletableResultCode} that has been completed successfully.
Ditto for below with unsuccessfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized we can probably just copy paste javadoc from CompletableFuture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
private final Object lock = new Object(); | ||
|
||
/** The export operation finished successfully. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed this {@link CompletableResultCode} successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -71,15 +74,25 @@ public boolean isStartRequired() { | |||
} | |||
|
|||
@Override | |||
@SuppressWarnings("BooleanParameter") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed on this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hangover from the past. Removed.
/** The export operation finished successfully. */ | ||
public CompletableResultCode succeed() { | ||
synchronized (lock) { | ||
if (succeeded == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior should be documented, since I think it's non-obvious that it's first-in-wins with succeed/fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, unit tests to verify it would also be very good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. New test added also.
private Boolean succeeded = null; | ||
|
||
@GuardedBy("lock") | ||
private final ArrayList<Runnable> completionActions = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be declared as a simple List
on the LHS, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I have just a couple of minor issues, mostly just that we should document the first-in-wins behavior of the result code, and write unit tests to verify that behavior. |
PR feedback addressed and is in as a separate commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! Let's
This PR recognises that the export and flush methods of span and trace exporters can be, and often are, implemented with long-lived operations over networks. We, therefore, see that these method return types are represented using a
CompletableResultCode
type that can be completed asynchronously for either failure or success scenarios. In either instance, the export code can now perform actions on failure and success. In the case of failure, which can also constitute a cancellation by an upstream component (such as a batch span processor), an exporter can now interrupt its threads or perform any other action it needs to. This approach of handling failure is also safer than what now exists, which is for an upstream component to interrupt a thread.The PR approach is focused on changing the signatures within the export methods now, while we can still introduce API-breaking changes. The implementation of the existing exporters mostly assumes a blocking interaction as before. The remaining internal exporter behaviour can be modified in the future if required and without affecting the API. One exception to this is the
OtlpGrpcMetricExporter
where @anuraaga kindly updated it to become async.For further background information on this change, see ##1422, which also refers to spec clarification via open-telemetry/opentelemetry-specification#707.