Skip to content
This repository has been archived by the owner on Oct 26, 2018. It is now read-only.

Design Notes on Correlation and Timings in DiagnosticSource events #46

Closed
rynowak opened this issue Apr 1, 2016 · 4 comments
Closed

Comments

@rynowak
Copy link
Member

rynowak commented Apr 1, 2016

I met with @avanderhoorn yesterday to discuss some of the issues raised by aspnet/Hosting#678 as the general desire to be able to correlate events across multiple DiagnosticSource listeners post-mortem.

Foreword

DiagnosticSource and the DiagnosticSourceAdapter are intended to provide a degree of separation between product code (producer) and diagnostics code like Glimpse (consumer). Any strategy that requires boilerplate on the part of the producer or requires repeated asks to "enhance" events to handle cross-cutting concerns is a non-solution.

Additionally, dictating that producers place non-fundamental data such as a timestamp or a correlationId into DiagnosticSource transforms it from a notification system into a protocol. Currently DiagnosticSource produces send objects that are specific to the business domain (HTTP, SQL) or state being signaled. It is right that the framework (producer) own, specify, and evolve the shape of the objects sent in events on their own terms. If HttpContext's definition must be changed, it will be changed because ASP.Net requires that it should change, not because the consumers of events require a change.

Data related to cross-cutting concerns would form a similar contract, but is "owned" by the consumer. If ASP.Net and HttpClient decide to include timestamps that aren't exactly compatible then this cross-cutting concern can't be handled in a cross-cutting way.

Issues

1). Where ASP.Net shows a duration in logs, the exact timing information captured by the framework is not made available via DiagnosticSource. In the case of showing logs and DiagnosticSource data side-by-side, small differences in timings would be visible.

2). The problem described in 1.) above can also manifest when multiple listeners are attached to DiagnosticSource assuming that each listener derives timings independently. Each listener will be called in sequence and so will generate a slightly different timestamp that includes the processing time of previous listeners.

3). Correlating events across multiple diagnostic source listeners can be challenging. This is for scenarios where data streams produced by multiple listeners are combined. Filtering redundant information would require some complex logic as there's no way to unify multiple 'views' of the same event. As highlighted in 2). simply adding timestamps to correlation doesn't work unless they are using the same source as each other.

Proposals

It seems like the best way to address 2). and 3). would be to add this information by convention to the diagnostic source adapter. The adapter already uses code generation (IL emit) to operate on event streams by convention, it makes sense to also address cross-cutting concerns in a conventional way in the adapter.

For instance, suppose the following listener code as a starting point:

public class MyListener
{
    [DiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    public void OnBeginRequest(HttpContext context)
    {
    }

    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context)
    {
    }
}

Adding manual tracking of a duration for the request would looks something like this:

public class MyListener
{
    private readonly AsyncLocal<long> _beginRequestTimestamp = new AsyncLocal<long>();

    [DiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    public void OnBeginRequest(HttpContext context)
    {
        _beginRequestTimestamp.Value = StopWatch.GetCurrentTimestamp();
    }

    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context)
    {
        var start = _beginRequestTimestamp.Value;
        var end = StopWatch.GetCurrentTimestamp();

        var duration = TimeSpan.FromTicks(end - start);
        ...
    }
}

Whereas listening to a duration sent in the payload for "EndRequest" would looks something like this:

public class MyListener
{
    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context, TimeSpan duration)
    {
    }
}

Therefore listening to a duration added to a "virtual payload" would look something like this:

public class MyListener
{
    [BeginDiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context, TimeSpan duration)
    {
    }
}

In this case code inside the diagnostic source adapter can perform the the same task performed by the "manual" state tracking example. Multiple listeners need to agree on the same definition of [BeginDiagnosticName(...)] to see the same timestamp.

We could extend this further and have the listener generate a correlationId:

public class MyListener
{
    [BeginDiagnosticName("Microsoft.AspNetCore.Hosting.BeginRequest")]
    [DiagnosticName("Microsoft.AspNetCore.Hosting.EndRequest")]
    public void OnEndRequest(HttpContext context, TimeSpan duration, Guid correlationId)
    {
    }
}

To address 1). we can allow an event payload containing a duration or correlationId to supercede these values as-generated by the adapter.

Remaining Concerns

The remaining concern then is to standardize on conventions then since this is all by convention.

Proposals:

  • correlationId - Guid identifying both events of a begin/end pair
  • duration - TimeSpan identifying the elapsed time between begin/end pair as observed by the adapter (before calling into listeners)

In either case these 'enhanced' payload items can be overridden by the provided payload.

@rynowak
Copy link
Member Author

rynowak commented Apr 1, 2016

Note that duration is specifically called out here and not timestamps.

Timestamps are hard. Our best method of obtaining them StopWatch.GetTimestamp() might give you the opaque value of a system performance counter. Ideally all server-side time data would be stored as UTC ticks, but that isn't this. So, you can safely use the value returned from GetTimestamp() to compute a duration, but not to universally indicate a point in time.

Additionally, timestamps have the same issues of skew/inconsistency with logging system as durations. Logging systems will implement their own timestamps, and there's no real way to make all of these observers align.

@davidfowl
Copy link
Member

The async local thing looks pretty cool actually. Very low tech functional approach. Any chance we can build on top of that?

@rynowak
Copy link
Member Author

rynowak commented Apr 4, 2016

It's super easy code to write that anyone can write and doesn't require any new features.

@Eilon Eilon added this to the Backlog milestone Apr 22, 2016
@aspnet-hello
Copy link

This issue was moved to aspnet/Home#2326

@aspnet aspnet locked and limited conversation to collaborators Jan 1, 2018
@aspnet-hello aspnet-hello removed this from the Backlog milestone Jan 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants