Filter activites proposals. #809

cijothomas · 2020-07-14T05:51:39Z

(copy-pasted from meeting agenda, not formatted correctly. Will edit later.)

Filtering of dependencies from httpclient. 4 approaches to compare and discuss.
1 - Use Activity processor - OT spec approach, not so performant as allocations are still done.
2 - Use Sampler (special one wrapping existing samplers) - OT spec supports this, and this is better than above, as some allocations are saved.
3. Custom solution for HttpClientInstrumentation - can use the filter to DiagnosticSource.IsEnabled itself, ensuring the unwanted activity is blocked from ever being created. This will be a custom .NET solution, not derived from OT spec.
This already exists, but not exposed as public. #741 is proposing to make this public.

[From Mike B on options API: If we decide to modify the options, I would like to also expose a callback action for changing the DisplayName of the Http Activity. We used to have some of the Url on there, and then we changed it to just HttpMethod. The spec says we should use method but may allow for a hook to change that (ref). Sergey brought up on PR 741 possibly adding a callback to allow users to customize the tags. I combined those two things into one callback. Proposal:

/// <summary>
/// Gets or sets an optional callback method for filtering HttpClient requests that are sent through the instrumentation.
/// </summary>
public Func<HttpRequestMessage, bool> FilterFunc { get; set; }
 
/// <summary>
/// Gets or sets an optional callback method that will be invoked after HttpClient Activity objects are started.
/// Useful for adding non-standard tags for collection or changing the <see cref="Activity.DisplayName"/> to include the request url.
/// </summary>
public Action<HttpRequestMessage, Activity> StartAction { get; set; }

]
4. Terminal Activity concept - this should solve the issue for a broader scope, not just http. For eg: If an exporter is simply writing to SQLServer, we don’t want to capture that as telemetry. Solution tied to httpclient won’t fix this.

[Note from Mike B on the above:
#4 is a slightly different solve than what is on the PR. Terminal activity (SpanKind.Terminal, ActivityKind.Terminal, etc) would be set by an exporter to flag its activity as internal to the execution of the OT mission, meaning, not-for-collection. For example, Zipkin exporter should be able to flag its Http transmission of span data as ignored from collection. Today these calls are filtered out in the logic found in HttpClientInstrumentationOptions (same logic being modified on the PR). This will be an on-going issue until we have a proper solve. For example, if you are using Stackdriver exporter, it transmits over gRPC using Google’s library. No issues today. But if someone comes along and instruments that gRPC library, then the transmit of spans creates a new span, which is transmitted, putting us in an infinite (potentially expensive) loop. You could of course build an ActivityProcessor to filter them, but the idea is to have some built-in way to handle this case automatically. Ideas have been to use SpanKind/ActivityKind or some kind of ambient context (like an AsyncLocal) running over the exporter activity that can be checked for exclusion by the instrumentation.
What the PR is seeking to do is give users the ability to filter out anything they don’t want captured, not just internal stuff. It’s a value-add, more or less. In my case, it is some appliance IT runs on our servers which emits Http calls to its agent out of our process. ActivityProcessor & ActivitySampler would work as solutions to this, but with the perf hit of creating an Activity you know you are going to drop/filter. Since there is already a filter executing in HttpClientInstrumentationOptions, it seemed like a good spot (easy to implement, non-perf impacting, perf-improving for filter case) to allow users to participate in the decision. ActivityProcessor or ActivitySampler would also work, but more effort to build than simply setting some options and you are hit with the perf of creating the Activity you don’t want.]

reyang · 2020-07-15T20:42:24Z

I think we need to clarify the scenario first, here are the scenarios that I can see:

As a customer using OpenTelemetry, I want to have a way to opt out certain instrumentation based on my business rule. For example, I'd like the HTTP client to give me traces only for my real customer traffic, not for the requests coming from crawlers and bots. In this way, I would expect something like a blacklist which could be protocol specific (e.g. don't collect traces if a certain HTTP header exists). It might be good if we can avoid creating such activity in the first place, but I won't bother having a filter solution (where we create the activity and drop it on the floor).
As the OpenTelemetry contributor/approver/maintainer, I need a mechanism to avoid live loop or different components shooting each other. For example, I don't want the HTTP request coming from my exporter to be automatically instrumented and turned into another Activity and feedback to my exporter again. I expect a general contract that different OpenTelemetry components will follow so we don't step on others toes.

For 1), OpenTelemetry Python have specific rules for different instrumentation/extension, for example https://github.com/open-telemetry/opentelemetry-python/blob/2a952b3d4540fa767974c48afdb781d9e9340edd/ext/opentelemetry-ext-flask/src/opentelemetry/ext/flask/__init__.py#L68

For 2) OpenTelemetry Python has a context flag suppress_instrumentation.

CodeBlanch · 2020-07-15T20:47:32Z

Sorry accidentally closed the issue. Well-put @reyang! Agree those are the 2 scenarios at play here.

alanwest · 2020-07-15T21:39:00Z

I'd like the HTTP client to give me traces only for my real customer traffic, not for the requests coming from crawlers and bots.

This is not really about a client side request, right? This seems more about filtering requests server side. If we consider the proposed API then, for ASP.NET core for example, I think we'd want something like:

public Func<HttpContext, bool> FilterFunc { get; set; }

or maybe

public Func<HttpRequest, bool> FilterFunc { get; set; }

Perhaps we can either think of a way to generalize the idea of the FilterFunc to accommodate arbitrary payloads based on instrumentation. Or maybe it just makes sense to focus on the more narrow use case and just have a list of URLs as it appears python has done with get_excluded_urls() - which I assume could be used either by client-side or server-side instrumentation.

reyang · 2020-07-15T22:15:29Z

This is not really about a client side request, right? This seems more about filtering requests server side.

It is just one example, the scenario could cover both client/outgoing request and server/incoming request.

Perhaps we can either think of a way to generalize the idea of the FilterFunc to accommodate arbitrary payloads based on instrumentation. Or maybe it just makes sense to focus on the more narrow use case and just have a list of URLs as it appears python has done with get_excluded_urls() - which I assume could be used either by client-side or server-side instrumentation.

I'm fine with either way as long as perf is reasonable (e.g. we don't do reflections and sacrifice performance in order to be generic).

CodeBlanch · 2020-07-18T04:06:52Z

I found a few examples (I think) in other OTel repos:

Python
https://opentelemetry-python.readthedocs.io/en/latest/ext/aiohttp_client/aiohttp_client.html#opentelemetry.ext.aiohttp_client.create_trace_config

Java
https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/9f1ffbe38c9aaba9bd4886b10ccdd98c7af34ec2/instrumentation-core/spring/spring-webmvc-3.1/src/main/java/io/opentelemetry/instrumentation/springwebmvc/WebMVCTracingFilter.java#L46

Go
https://github.com/open-telemetry/opentelemetry-go/blob/b2b23e15e5724d2fdb3107ee8aa2d022de3af508/instrumentation/othttp/config.go#L108

I would like to move ahead with the below additions (on #741) for Http outgoing calls, we can tackle incoming calls after that? The perf will be more or less what it is today for the filter, because they already run through a filter, it's just not public. For the "StartAction" the perf hit when not used will be a null check. When in use, perf will be up to the user as they are driving it.

Public API additions to HttpClientInstrumentationOptions:

/// <summary>
/// Gets or sets an optional callback method for filtering HttpClient requests that are sent through the instrumentation.
/// </summary>
public Func<HttpRequestMessage, bool> FilterFunc { get; set; }
 
/// <summary>
/// Gets or sets an optional callback method that will be invoked after HttpClient Activity objects are started.
/// Useful for adding non-standard tags for collection or changing the <see cref="Activity.DisplayName"/> to include the request url.
/// </summary>
public Action<HttpRequestMessage, Activity> StartAction { get; set; }

Same will be added for HttpWebRequestInstrumentationOptions (.NET Framework) but will pass HttpWebRequest instead of HttpRequestMessage.

reyang · 2020-07-18T05:30:00Z

I would like to move ahead with the below additions (on #741) for Http outgoing calls, we can tackle incoming calls after that? The perf will be more or less what it is today for the filter, because they already run through a filter, it's just not public. For the "StartAction" the perf hit when not used will be a null check. When in use, perf will be up to the user as they are driving it.

I support you @CodeBlanch.

cijothomas · 2020-07-18T15:07:24Z

public Action<HttpRequestMessage, Activity> StartAction { get; set; } - This one is a separate topic and we can discuss it separately before adding that.

public Func<HttpRequestMessage, bool> FilterFunc { get; set; } - Agree with this and equivalent with HttpWebRequest

ejsmith · 2020-07-23T13:11:07Z

How would I as the author of an instrumentation that uses HttpClient internally be able to filter out the HttpClient activities without having to tell my users to manually add a request filter to their HttpClient instrumentation options?

cijothomas · 2020-07-23T14:37:14Z

How would I as the author of an instrumentation that uses HttpClient internally be able to filter out the HttpClient activities without having to tell my users to manually add a request filter to their HttpClient instrumentation options?

This proposal won't help your scenario. Lets discuss this in the other issue, and find a solution.

reyang · 2020-09-11T05:39:00Z

I think both filtering (at instrumentation level) and suppression (based on the runtime context, could happen anywhere including exporter, instrumentation, processor) are solved.

cijothomas added the enhancement New feature or request label Jul 14, 2020

CodeBlanch closed this as completed Jul 15, 2020

CodeBlanch reopened this Jul 15, 2020

CodeBlanch mentioned this issue Jul 18, 2020

Added options to HttpWebRequest instrumentation. #741

Merged

3 tasks

reyang mentioned this issue Jul 23, 2020

Don't record child activities #893

Closed

This was referenced Jul 30, 2020

Runtime context API #948

Merged

Add Sdk.SuppressInstrumentation #960

Merged

CodeBlanch mentioned this issue Aug 1, 2020

EnrichmentScope & EnrichingActivityProcessor #969

Closed

5 tasks

reyang closed this as completed Sep 11, 2020

reyang mentioned this issue Jan 17, 2023

add SuppressTracing flag open-telemetry/opentelemetry-specification#3103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter activites proposals. #809

Filter activites proposals. #809

cijothomas commented Jul 14, 2020 •

edited by CodeBlanch

Loading

reyang commented Jul 15, 2020 •

edited

Loading

CodeBlanch commented Jul 15, 2020 •

edited

Loading

alanwest commented Jul 15, 2020 •

edited

Loading

reyang commented Jul 15, 2020

CodeBlanch commented Jul 18, 2020

reyang commented Jul 18, 2020

cijothomas commented Jul 18, 2020

ejsmith commented Jul 23, 2020

cijothomas commented Jul 23, 2020

reyang commented Sep 11, 2020

Filter activites proposals. #809

Filter activites proposals. #809

Comments

cijothomas commented Jul 14, 2020 • edited by CodeBlanch Loading

reyang commented Jul 15, 2020 • edited Loading

CodeBlanch commented Jul 15, 2020 • edited Loading

alanwest commented Jul 15, 2020 • edited Loading

reyang commented Jul 15, 2020

CodeBlanch commented Jul 18, 2020

reyang commented Jul 18, 2020

cijothomas commented Jul 18, 2020

ejsmith commented Jul 23, 2020

cijothomas commented Jul 23, 2020

reyang commented Sep 11, 2020

cijothomas commented Jul 14, 2020 •

edited by CodeBlanch

Loading

reyang commented Jul 15, 2020 •

edited

Loading

CodeBlanch commented Jul 15, 2020 •

edited

Loading

alanwest commented Jul 15, 2020 •

edited

Loading