Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Telemetry Consumption package #571

Merged
merged 15 commits into from
Jan 25, 2021

Conversation

MihaZupan
Copy link
Member

Opening for feedback on the design and implementation.

Includes APIs for 6 EventSources:

  • Proxy (YARP)
  • Kestrel
  • Http
  • Sockets
  • NetSecurity
  • NameResolution

For each of those, providing 4 components:

  • I{Name}TelemetryConsumer - interface with methods mirroring EventSource's Events that are relevant within the scope of an HTTP request
  • {Name}Metrics - Immutable POCO containing properties mirroring EventSource's EventCounters
  • I{Name}MetricsConsumer - interface with a single method void On{Name}Metrics(Metrics oldMetrics, Metrics newMetrics)
  • Internal {Name}EventListenerService - internal infrastructure for dispatching events and metrics, exposed as an extension method on IReverseProxyBuilder: Add{Name}TelemetryListener

The added tests are not exhaustive for every event and parameter.

return;
}

foreach (var consumer in _serviceProvider.GetServices<IProxyMetricsConsumer>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider for consuming in parallel?

Copy link
Member Author

@MihaZupan MihaZupan Nov 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These callbacks execute inline within the request, so the expected scenario is to have very few (just 1) consumer and for it to be very cheap and execute quickly.
I don't believe we would gain anything from parallelizing consumers (and the cost of it would likely outweigh any benefit).

If you were only referring to the metrics part, we could consider pushing that work onto the ThreadPool instead

Copy link
Contributor

@alnikola alnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.


const int TotalEventCounters = 10;

if (++_eventCountersCount == TotalEventCounters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move the increment out of if expression? It seems a bit unusual.

return;
}

foreach (var consumer in _serviceProvider.GetServices<IKestrelMetricsConsumer>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood it correctly, IKestrelMetricsConsumer will be resolved for each EventWrittenEventArgs which seems a lot. Would it be better to resolve it once and store on a field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the metrics, we're resolving the consumers on each EventCounters update - so once a second, which doesn't seem problematic.

For events, we are resolving it for each event.
Since we're relying on DI for lifetime management, we have to do it each time. If we documented that we disallow transient consumer services (since they don't make much sense to use for events anyway), we could avoid the cost on each event and instead use an AsyncLocal to cache the resolved services.

@MihaZupan
Copy link
Member Author

@davidni Could you please look at this as well to make sure the public APIs are matching your expectations/needs

@Tratcher Tratcher requested review from davidni and shirhatti December 3, 2020 23:08
}
}

private sealed class MetricsConsumer :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How closely do these Telemetry and Metrics tests reflect the actual expected usage? If this isn't how we expect it to be used then please add a sample that shows the expected usage.

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented only on the first set of consumer classes but the comments apply to all of them.

@@ -0,0 +1,257 @@
// Copyright (c) Microsoft Corporation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're adding a new library ReverseProxy.TelemetryConsumption shouldn't the tests be also in their own test project?

src/ReverseProxy.TelemetryConsumption/MetricsOptions.cs Outdated Show resolved Hide resolved

protected override void OnEventWritten(EventWrittenEventArgs eventData)
{
if (eventData.EventId < 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we throw an exception from OnEventWritten method, it'll bubble up into whatever place generated the event, e.g. the call of SendAsync in HttpProxy. Shouldn't the whole method body be in try catch block then? Or do we want the errors from events consumption affect the proxy and thus get noticed more easily?


var expected = new[]
{
"OnRequestStart-Kestrel",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be created with nameof(...) as well? Or are they like this on purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these could be nameof, but some are explicitly different since some names repeat.

Specifically both Runtime Http and Kestrel include RequestStart and RequestStop events.


namespace Microsoft.ReverseProxy.Common
{
public class TestEnvironment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for extracting this.

}
}

protected override void OnEventWritten(EventWrittenEventArgs eventData)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnEventWritten can run before the derived constructor on an EventListener runs.- dotnet/runtime#38804 (comment)

You may need to guard against it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more checks to avoid this in d11d635.

From what I can tell, the linked example has a race condition:
After setting _initialized but before looping over the queue in the ctor, new EvnetSources may still be added to the queue and enabled in OnEventSourceCreated, and will thus be enabled twice (don't know if EventSource internals have any checks guarding against this).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnableEvents is safe to call twice.

@@ -3,7 +3,7 @@

namespace Microsoft.ReverseProxy.Telemetry
{
internal enum ProxyStage : int
public enum ProxyStage : int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought everything was supposed to be decoupled to the point where you didn't even need a dependency on the core package? Is this type the only dependency? Would duplicating it in the telemetry package work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, technically this is the only type we depend on.
As long as the values are kept in sync, we can duplicate it in the Telemetry package and remove a dependency on the core package entirely.

The only other thing the telemetry package depends on is that consumers of the library target a preview >= 7, otherwise they will not observe proxy events (wouldn't break anything tho).

@MihaZupan
Copy link
Member Author

We're currently relying on the package reference in 3 places:

  • ProxyStage enum (currently internal in core package)
  • ProxyError enum (public)
  • IReverseProxyBuilder interface for the AddXTelemetryListener extension methods

The extension methods could be swapped to use IServiceCollection instead and the ProxyStage enum could be duplicated and only exposed in the Consumption package.

But we would have to duplicate the already public ProxyError enum.

I think we should:

  • keep the package reference to the core package
  • use the IReverseProxyBuilder interface
  • duplicate and expose the ProxyStage enum in the consumption package only
  • use the core package's ProxyError

@Tratcher thoughts?

Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That plan makes sense, but see my comment below about using IReverseProxyBuilder.

return builder;
}

public static IReverseProxyBuilder AddKestrelTelemetryListener(this IReverseProxyBuilder builder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kestrel, HttpClient, DNS, etc. specific extensions don't belong on IReverseProxyBuilder, move these to IServiceCollection. At that point I'm not sure if it makes sense to still have the others on IReverseProxyBuilder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all to use IServiceCollection instead.

@MihaZupan MihaZupan marked this pull request as ready for review January 18, 2021 17:16
@MihaZupan
Copy link
Member Author

@Tratcher is there anything else that should be done here before merging the PR?

@MihaZupan MihaZupan mentioned this pull request Jan 19, 2021
Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MihaZupan
Copy link
Member Author

I added a sample telemetry & metrics consumer to Code.Sample. More examples can be added as part of #676.

I also wrapped the metrics consumers in a try/catch, passing any exceptions to ILogger instead of letting them propagate to the CounterGroup thread and crashing the process.

@MihaZupan MihaZupan added this to the YARP 1.0.0-preview9 milestone Jan 25, 2021
@MihaZupan MihaZupan merged commit 33a175b into microsoft:master Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants