Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new TrackWork API #644

Closed
lmolkova opened this issue Nov 1, 2017 · 10 comments
Closed

Introduce new TrackWork API #644

lmolkova opened this issue Nov 1, 2017 · 10 comments
Assignees

Comments

@lmolkova
Copy link
Member

lmolkova commented Nov 1, 2017

We now have StartOperation and StopOperation TelemetryClient extension methods to track custom operations.

While it's pretty useful and flexible, typical scenario requires customer to write something like

    var operation = telemetryClient.StartOperation<RequestTelemetry>("name");

    try
    {
        await DoWorkAsync();
        operation.Telemetry.Success = true;
    }
    catch (Exception e)
    {
        telemetryClient.TrackException(e);
        operation.Telemetry.Success = false;
        throw;
    }
    finally
    {
        telemetryClient.StopOperation(operation);
    }

This approach potentially has an issue with execution context: it may not be preserved between Start/Stop, so the operation may not be tracked at all.
All above code except DoWorkAsync has nothing to do with user code.

Proposal

Introduce new TelemetryClient extensions methods:

  • async Task TrackWorkAsync( Func<Task> work, Activity activity=null)

  • async Task TrackWorkAsync( Func<Task> work, OperationTelemetry telemetry)

  • other necessary overloads

These methods would do the same as above example does.

Real world scenario

I'm working on the queues instrumentation and majority of queue clients (Azure Storage Queue, EventHub, certain ServiceBus usage scenarios) have API to receive a message(s) from the queue but do not include API to process a message.

I. e. if dependency tracing is implemented, AppInsights then would track 'receive' call to the queue, but message processing should be instrumented manually.

Queues will provide method to extract Activity from the messages, so customers are expected to write code

var message = queue.Receive();
var activity = message.ExtractActivity();

telemetryClient.TrackWork( () => DoWorkAsync(), activity );

@SergeyKanzhelev @Dmitry-Matveev @TomMilos thoughts?

@lmolkova lmolkova self-assigned this Nov 3, 2017
@lmolkova
Copy link
Member Author

lmolkova commented Nov 3, 2017

@zakimaksyutov
Copy link
Member

@lmolkova, is TrackWork a syntax sugar for above example? Should it take operation name as parameter as well then? Or does it deduce it somehow? If yes then can you please elaborate on how it does it?

@SergeyKanzhelev
Copy link
Contributor

@lmolkova, who will be who's parent when you extracted an Activity from the message? Proposed wrapper API is quite opinionated in this regards. It also works for this special case of async invocation. What if developer need to wrap DoWork in custom error handling logic? Or wait on it with specific timeout? Using notion of Track forces you to not return a Task that may allow to implement scenarios like this.

Why do you think this helper method should be in base, not in special queue's library? Or even simply implemented in every application? Typically an application works with a single queue so there will be no code reuse even in a single app. Or you envision auto-injecting of this call somehow?

@pharring
Copy link
Member

pharring commented Nov 5, 2017

This approach potentially has an issue with execution context: it may not be preserved between Start/Stop, so the operation may not be tracked at all.

@lmolkova Can you elaborate? Under what circumstances is the execution context not preserved?

@lmolkova
Copy link
Member Author

lmolkova commented Nov 5, 2017

@zakimaksyutov @SergeyKanzhelev
It will be perhaps long a non-direct answer to your questions.

We use Activity as a holder of the correlation context. I.e. when dependency call is made library injects Activity (Id and Baggage) into the HTTP request, queue message, ServiceFabric remoting and any other protocol. The same way Activity needs to be extracted when incoming call is received.

So the Activity is a holder of the tracing context and only particular library knows how to extract it.

This could apply to queues, Http frameworks other than ASP.NET (Anna, Nancy, Owin self-host, etc).
All those cases require some custom instrumentation with AI (that's first reason why I think this should be done in Base, rather than some queue package).

With Start/StopOperation users need to feed AI with tracing context. It could be done through the operation name, id, other telemetry properties, the whole telemetry object, but this requires user to transform activity to telemetry that is pretty easy but still requires documentation and samples (that's second reason why I think this should be done in Base, rather than some queue package).

I propose to create a pattern (rather than sugar): libs implement Extract method that returns Activity from the data and AI can track it along with the work. It allows AI not to create a special package for every new need and lib to decouple from tracing system needs. I'm implementing this pattern in ServiceBus now.

Should it take operation name as parameter as well then?

Activity has OperationName

After thinking a bit more about it I even think the method should allow to alter created telemetry like

async Task TrackWorkAsync<T>( Func<T, Task> work, Activity activity=null) where T : OperationTelemetry

to give existing flexibility with Start/StopOperation.

who will be who's parent when you extracted an Activity from the message?

In a normal scenario, this activity should not even be started, otherwise there is no reason to pass it to AI (it already knows Activity.Current).
The parent of this activity is expected to be external - the message that it was transferred with has been just received from the queue/SF/Http Request and processing is about to be started.

What if developer need to wrap DoWork in custom error handling logic? Or wait on it with specific timeout? Using notion of Track forces you to not return a Task that may allow to implement scenarios like this.

I don't see why they can't wrap await TrackWorkAsync() with their error handling/timeout handling/etc... They can still pass cancellation tokens into their DoWork lambda and handle exceptions around TrackWork.
If some rare corner case scenarios cannot work with it, users are still free to user Start/Operation. I'd help them with ActivityToTelemetry helper, but it could just be also done through documentation.

I hope it answers your questions, otherwise let me know, I'll schedule a meeting to go through it and some demo.

@lmolkova
Copy link
Member Author

lmolkova commented Nov 5, 2017

@pharring

E.g. when you deal with pre-async/await Begin/End callback pattern (e.g. in HttpWebRequest) - ExecutionContext is not preserved at End callback

there are other cases, e.g. one I mentioned here

var firstOperation = telemetryClient.StartOperation<DependencyTelemetry>("task 1");
var firstTask = RunMyTaskAsync();

var secondOperation = telemetryClient.StartOperation<DependencyTelemetry>("task 2");
var secondTask = RunMyTaskAsync();

await firstTask;

// This will do nothing and will not report telemetry for the first operation
// as currently secondOperation is active.
telemetryClient.StopOperation(firstOperation); 

await secondTask;

@lmolkova
Copy link
Member Author

lmolkova commented Nov 9, 2017

As per offline discussion with @SergeyKanzhelev following API may provide better perception and value:

var message = queue.Receive();
var activity = message.ExtractActivity();

using (telemetryClient.StartOperation<RequestTelemetry>(activity))
{
    await DoWorkAsync();
}

There are two concerns (that are common with other StartOperation overloads):

  • Concurrent calls to StartOperation and execution context: e.g. 10 messages are extracted and processed in parallel.

  • we expect that in the typical case there will be no Activity.Current when the telemetryClient.StartOperation<RequestTelemetry>(activity) is called.

  • However if there is one, activity is detached from it - it was extracted from the message and came from the consumer. The problem is that when activity is stopped, it may lead to losing 'outer' activity as well.

We should reflect it as a part of StartOperation documentation and provide samples how track sync/ async operations, start operations in parallel and preserve parent context.

@lmolkova
Copy link
Member Author

lmolkova commented Dec 6, 2017

@SergeyKanzhelev

Note that introducing this API very unlikely is a breaking change for powershell scripts that use ApplicationInsights.dll alone AND use StartOperation.

Previously we expected users to use AI.dll alone (without other dependencies) in scripts for e.g. deployment tasks and we have protected them: if DiagnosticsSource.dll is not found, AI falls back to internal correlation context.

So, if powershell script (does not repro with C#) tries to call StartOperation, after such change, it will break failing to find DiagnosticSource. It happens because there is an overload with Activity and even though, it's not called, other StartOperation method resolution fails

However, it's very unlikely that powershell scripts use StartOperation as it's generic method and requires some reflection magic to be called.

TelemetryClient.Track* APIs are not affected and still work as expected.

@dennis-yemelyanov2
Copy link

@lmolkova @SergeyKanzhelev I tried running the above example with firstOperation and secondOperation, and for me telemetry for task1 is still logged (after logging an error "Operation to stop does not match the current operation"). Looking at the code for OperationHolder.Dispose() it seems like the return statement on line 86 only returns from ActivityExtensions.TryRun(), but not from Dispose(). isActivityAvailable is true, so the second return is never called either, and the operation gets logged successfully. My understanding from this thread is it shouldn't be logged in this case? Can you please confirm if current behavior is by design or a bug?

@lmolkova
Copy link
Member Author

lmolkova commented Dec 27, 2017

@dennis-yemelyanov2

Indeed, I would say it's a bug.
Even if telemetry is reported, some of its properties may be initialized from the System.Diagnostics.Activity (.NET tracing context that is used by AppInsights) during Track() by SDK or custom TelemetryIntitializer. As the Activity.Current is no longer correct and represents another operation, so reported telemetry under some conditions will be incorrect.

I created a bug to track it: #686

TimothyMothra pushed a commit that referenced this issue Oct 25, 2019
new script: GenerateReleaseMetadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants