Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter activites proposals. #809

Closed
cijothomas opened this issue Jul 14, 2020 · 10 comments
Closed

Filter activites proposals. #809

cijothomas opened this issue Jul 14, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@cijothomas
Copy link
Member

cijothomas commented Jul 14, 2020

(copy-pasted from meeting agenda, not formatted correctly. Will edit later.)

Filtering of dependencies from httpclient. 4 approaches to compare and discuss.
1 - Use Activity processor - OT spec approach, not so performant as allocations are still done.
2 - Use Sampler (special one wrapping existing samplers) - OT spec supports this, and this is better than above, as some allocations are saved.
3. Custom solution for HttpClientInstrumentation - can use the filter to DiagnosticSource.IsEnabled itself, ensuring the unwanted activity is blocked from ever being created. This will be a custom .NET solution, not derived from OT spec.
This already exists, but not exposed as public. #741 is proposing to make this public.

[From Mike B on options API: If we decide to modify the options, I would like to also expose a callback action for changing the DisplayName of the Http Activity. We used to have some of the Url on there, and then we changed it to just HttpMethod. The spec says we should use method but may allow for a hook to change that (ref). Sergey brought up on PR 741 possibly adding a callback to allow users to customize the tags. I combined those two things into one callback. Proposal:

/// <summary>
/// Gets or sets an optional callback method for filtering HttpClient requests that are sent through the instrumentation.
/// </summary>
public Func<HttpRequestMessage, bool> FilterFunc { get; set; }
 
/// <summary>
/// Gets or sets an optional callback method that will be invoked after HttpClient Activity objects are started.
/// Useful for adding non-standard tags for collection or changing the <see cref="Activity.DisplayName"/> to include the request url.
/// </summary>
public Action<HttpRequestMessage, Activity> StartAction { get; set; }

]
4. Terminal Activity concept - this should solve the issue for a broader scope, not just http. For eg: If an exporter is simply writing to SQLServer, we don’t want to capture that as telemetry. Solution tied to httpclient won’t fix this.

[Note from Mike B on the above:
#4 is a slightly different solve than what is on the PR. Terminal activity (SpanKind.Terminal, ActivityKind.Terminal, etc) would be set by an exporter to flag its activity as internal to the execution of the OT mission, meaning, not-for-collection. For example, Zipkin exporter should be able to flag its Http transmission of span data as ignored from collection. Today these calls are filtered out in the logic found in HttpClientInstrumentationOptions (same logic being modified on the PR). This will be an on-going issue until we have a proper solve. For example, if you are using Stackdriver exporter, it transmits over gRPC using Google’s library. No issues today. But if someone comes along and instruments that gRPC library, then the transmit of spans creates a new span, which is transmitted, putting us in an infinite (potentially expensive) loop. You could of course build an ActivityProcessor to filter them, but the idea is to have some built-in way to handle this case automatically. Ideas have been to use SpanKind/ActivityKind or some kind of ambient context (like an AsyncLocal) running over the exporter activity that can be checked for exclusion by the instrumentation.
What the PR is seeking to do is give users the ability to filter out anything they don’t want captured, not just internal stuff. It’s a value-add, more or less. In my case, it is some appliance IT runs on our servers which emits Http calls to its agent out of our process. ActivityProcessor & ActivitySampler would work as solutions to this, but with the perf hit of creating an Activity you know you are going to drop/filter. Since there is already a filter executing in HttpClientInstrumentationOptions, it seemed like a good spot (easy to implement, non-perf impacting, perf-improving for filter case) to allow users to participate in the decision. ActivityProcessor or ActivitySampler would also work, but more effort to build than simply setting some options and you are hit with the perf of creating the Activity you don’t want.]

@cijothomas cijothomas added the enhancement New feature or request label Jul 14, 2020
@reyang
Copy link
Member

reyang commented Jul 15, 2020

I think we need to clarify the scenario first, here are the scenarios that I can see:

  1. As a customer using OpenTelemetry, I want to have a way to opt out certain instrumentation based on my business rule. For example, I'd like the HTTP client to give me traces only for my real customer traffic, not for the requests coming from crawlers and bots. In this way, I would expect something like a blacklist which could be protocol specific (e.g. don't collect traces if a certain HTTP header exists). It might be good if we can avoid creating such activity in the first place, but I won't bother having a filter solution (where we create the activity and drop it on the floor).
  2. As the OpenTelemetry contributor/approver/maintainer, I need a mechanism to avoid live loop or different components shooting each other. For example, I don't want the HTTP request coming from my exporter to be automatically instrumented and turned into another Activity and feedback to my exporter again. I expect a general contract that different OpenTelemetry components will follow so we don't step on others toes.

For 1), OpenTelemetry Python have specific rules for different instrumentation/extension, for example https://github.com/open-telemetry/opentelemetry-python/blob/2a952b3d4540fa767974c48afdb781d9e9340edd/ext/opentelemetry-ext-flask/src/opentelemetry/ext/flask/__init__.py#L68

For 2) OpenTelemetry Python has a context flag suppress_instrumentation.

@CodeBlanch
Copy link
Member

CodeBlanch commented Jul 15, 2020

Sorry accidentally closed the issue. Well-put @reyang! Agree those are the 2 scenarios at play here.

@alanwest
Copy link
Member

alanwest commented Jul 15, 2020

I'd like the HTTP client to give me traces only for my real customer traffic, not for the requests coming from crawlers and bots.

This is not really about a client side request, right? This seems more about filtering requests server side. If we consider the proposed API then, for ASP.NET core for example, I think we'd want something like:

public Func<HttpContext, bool> FilterFunc { get; set; }

or maybe

public Func<HttpRequest, bool> FilterFunc { get; set; }

Perhaps we can either think of a way to generalize the idea of the FilterFunc to accommodate arbitrary payloads based on instrumentation. Or maybe it just makes sense to focus on the more narrow use case and just have a list of URLs as it appears python has done with get_excluded_urls() - which I assume could be used either by client-side or server-side instrumentation.

@reyang
Copy link
Member

reyang commented Jul 15, 2020

This is not really about a client side request, right? This seems more about filtering requests server side.

It is just one example, the scenario could cover both client/outgoing request and server/incoming request.

Perhaps we can either think of a way to generalize the idea of the FilterFunc to accommodate arbitrary payloads based on instrumentation. Or maybe it just makes sense to focus on the more narrow use case and just have a list of URLs as it appears python has done with get_excluded_urls() - which I assume could be used either by client-side or server-side instrumentation.

I'm fine with either way as long as perf is reasonable (e.g. we don't do reflections and sacrifice performance in order to be generic).

@CodeBlanch
Copy link
Member

I found a few examples (I think) in other OTel repos:

Python
https://opentelemetry-python.readthedocs.io/en/latest/ext/aiohttp_client/aiohttp_client.html#opentelemetry.ext.aiohttp_client.create_trace_config

Java
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/9f1ffbe38c9aaba9bd4886b10ccdd98c7af34ec2/instrumentation-core/spring/spring-webmvc-3.1/src/main/java/io/opentelemetry/instrumentation/springwebmvc/WebMVCTracingFilter.java#L46

Go
https://github.com/open-telemetry/opentelemetry-go/blob/b2b23e15e5724d2fdb3107ee8aa2d022de3af508/instrumentation/othttp/config.go#L108

I would like to move ahead with the below additions (on #741) for Http outgoing calls, we can tackle incoming calls after that? The perf will be more or less what it is today for the filter, because they already run through a filter, it's just not public. For the "StartAction" the perf hit when not used will be a null check. When in use, perf will be up to the user as they are driving it.

Public API additions to HttpClientInstrumentationOptions:

/// <summary>
/// Gets or sets an optional callback method for filtering HttpClient requests that are sent through the instrumentation.
/// </summary>
public Func<HttpRequestMessage, bool> FilterFunc { get; set; }
 
/// <summary>
/// Gets or sets an optional callback method that will be invoked after HttpClient Activity objects are started.
/// Useful for adding non-standard tags for collection or changing the <see cref="Activity.DisplayName"/> to include the request url.
/// </summary>
public Action<HttpRequestMessage, Activity> StartAction { get; set; }

Same will be added for HttpWebRequestInstrumentationOptions (.NET Framework) but will pass HttpWebRequest instead of HttpRequestMessage.

@reyang
Copy link
Member

reyang commented Jul 18, 2020

I would like to move ahead with the below additions (on #741) for Http outgoing calls, we can tackle incoming calls after that? The perf will be more or less what it is today for the filter, because they already run through a filter, it's just not public. For the "StartAction" the perf hit when not used will be a null check. When in use, perf will be up to the user as they are driving it.

I support you @CodeBlanch.

@cijothomas
Copy link
Member Author

public Action<HttpRequestMessage, Activity> StartAction { get; set; } - This one is a separate topic and we can discuss it separately before adding that.

public Func<HttpRequestMessage, bool> FilterFunc { get; set; } - Agree with this and equivalent with HttpWebRequest

@ejsmith
Copy link
Contributor

ejsmith commented Jul 23, 2020

How would I as the author of an instrumentation that uses HttpClient internally be able to filter out the HttpClient activities without having to tell my users to manually add a request filter to their HttpClient instrumentation options?

@cijothomas
Copy link
Member Author

How would I as the author of an instrumentation that uses HttpClient internally be able to filter out the HttpClient activities without having to tell my users to manually add a request filter to their HttpClient instrumentation options?

This proposal won't help your scenario. Lets discuss this in the other issue, and find a solution.

@reyang
Copy link
Member

reyang commented Sep 11, 2020

I think both filtering (at instrumentation level) and suppression (based on the runtime context, could happen anywhere including exporter, instrumentation, processor) are solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants