Design Notes on Correlation and Timings in DiagnosticSource events #46

rynowak · 2016-04-01T17:54:39Z

I met with @avanderhoorn yesterday to discuss some of the issues raised by aspnet/Hosting#678 as the general desire to be able to correlate events across multiple DiagnosticSource listeners post-mortem.

Foreword

DiagnosticSource and the DiagnosticSourceAdapter are intended to provide a degree of separation between product code (producer) and diagnostics code like Glimpse (consumer). Any strategy that requires boilerplate on the part of the producer or requires repeated asks to "enhance" events to handle cross-cutting concerns is a non-solution.

Additionally, dictating that producers place non-fundamental data such as a timestamp or a correlationId into DiagnosticSource transforms it from a notification system into a protocol. Currently DiagnosticSource produces send objects that are specific to the business domain (HTTP, SQL) or state being signaled. It is right that the framework (producer) own, specify, and evolve the shape of the objects sent in events on their own terms. If HttpContext's definition must be changed, it will be changed because ASP.Net requires that it should change, not because the consumers of events require a change.

Data related to cross-cutting concerns would form a similar contract, but is "owned" by the consumer. If ASP.Net and HttpClient decide to include timestamps that aren't exactly compatible then this cross-cutting concern can't be handled in a cross-cutting way.

Issues

1). Where ASP.Net shows a duration in logs, the exact timing information captured by the framework is not made available via DiagnosticSource. In the case of showing logs and DiagnosticSource data side-by-side, small differences in timings would be visible.

2). The problem described in 1.) above can also manifest when multiple listeners are attached to DiagnosticSource assuming that each listener derives timings independently. Each listener will be called in sequence and so will generate a slightly different timestamp that includes the processing time of previous listeners.

3). Correlating events across multiple diagnostic source listeners can be challenging. This is for scenarios where data streams produced by multiple listeners are combined. Filtering redundant information would require some complex logic as there's no way to unify multiple 'views' of the same event. As highlighted in 2). simply adding timestamps to correlation doesn't work unless they are using the same source as each other.

Proposals

It seems like the best way to address 2). and 3). would be to add this information by convention to the diagnostic source adapter. The adapter already uses code generation (IL emit) to operate on event streams by convention, it makes sense to also address cross-cutting concerns in a conventional way in the adapter.

For instance, suppose the following listener code as a starting point:

public class MyListener
{
    [DiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    public void OnBeginRequest(HttpContext context)
    {
    }

    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context)
    {
    }
}

Adding manual tracking of a duration for the request would looks something like this:

public class MyListener
{
    private readonly AsyncLocal<long> _beginRequestTimestamp = new AsyncLocal<long>();

    [DiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    public void OnBeginRequest(HttpContext context)
    {
        _beginRequestTimestamp.Value = StopWatch.GetCurrentTimestamp();
    }

    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context)
    {
        var start = _beginRequestTimestamp.Value;
        var end = StopWatch.GetCurrentTimestamp();

        var duration = TimeSpan.FromTicks(end - start);
        ...
    }
}

Whereas listening to a duration sent in the payload for "EndRequest" would looks something like this:

public class MyListener
{
    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context, TimeSpan duration)
    {
    }
}

Therefore listening to a duration added to a "virtual payload" would look something like this:

public class MyListener
{
    [BeginDiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context, TimeSpan duration)
    {
    }
}

In this case code inside the diagnostic source adapter can perform the the same task performed by the "manual" state tracking example. Multiple listeners need to agree on the same definition of [BeginDiagnosticName(...)] to see the same timestamp.

We could extend this further and have the listener generate a correlationId:

public class MyListener
{
    [BeginDiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context, TimeSpan duration, Guid correlationId)
    {
    }
}

To address 1). we can allow an event payload containing a duration or correlationId to supercede these values as-generated by the adapter.

Remaining Concerns

The remaining concern then is to standardize on conventions then since this is all by convention.

Proposals:

correlationId - Guid identifying both events of a begin/end pair
duration - TimeSpan identifying the elapsed time between begin/end pair as observed by the adapter (before calling into listeners)

In either case these 'enhanced' payload items can be overridden by the provided payload.

The text was updated successfully, but these errors were encountered:

rynowak · 2016-04-01T17:59:52Z

Note that duration is specifically called out here and not timestamps.

Timestamps are hard. Our best method of obtaining them StopWatch.GetTimestamp() might give you the opaque value of a system performance counter. Ideally all server-side time data would be stored as UTC ticks, but that isn't this. So, you can safely use the value returned from GetTimestamp() to compute a duration, but not to universally indicate a point in time.

Additionally, timestamps have the same issues of skew/inconsistency with logging system as durations. Logging systems will implement their own timestamps, and there's no real way to make all of these observers align.

davidfowl · 2016-04-02T05:59:06Z

The async local thing looks pretty cool actually. Very low tech functional approach. Any chance we can build on top of that?

rynowak · 2016-04-04T16:11:04Z

It's super easy code to write that anyone can write and doesn't require any new features.

aspnet-hello · 2018-01-01T07:23:05Z

This issue was moved to aspnet/Home#2326

Eilon added this to the Backlog milestone Apr 22, 2016

aspnet locked and limited conversation to collaborators Jan 1, 2018

aspnet-hello removed this from the Backlog milestone Jan 1, 2018

aspnet-hello closed this as completed Jan 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Notes on Correlation and Timings in DiagnosticSource events #46

Design Notes on Correlation and Timings in DiagnosticSource events #46

rynowak commented Apr 1, 2016

rynowak commented Apr 1, 2016

davidfowl commented Apr 2, 2016

rynowak commented Apr 4, 2016

aspnet-hello commented Jan 1, 2018

Design Notes on Correlation and Timings in DiagnosticSource events #46

Design Notes on Correlation and Timings in DiagnosticSource events #46

Comments

rynowak commented Apr 1, 2016

Foreword

Issues

Proposals

Remaining Concerns

rynowak commented Apr 1, 2016

davidfowl commented Apr 2, 2016

rynowak commented Apr 4, 2016

aspnet-hello commented Jan 1, 2018