Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial POC version of url-based sampler #66

Closed
wants to merge 7 commits into from
Closed

Initial POC version of url-based sampler #66

wants to merge 7 commits into from

Conversation

iNikem
Copy link
Contributor

@iNikem iNikem commented Aug 5, 2021

Closes #65

Asking for the following feedback.

First, which semantic attribute should we match against? http.url or http.route. The latter is not always available, the first has some problems, see below.

Second, should we match the whole url in http.url or only path portion? If I configure SDK manually inside my application's code, then I don't know the hostname that my application will respond to. Thus it seems that we cannot match full url, only path portion. But this means that on every call to shouldSample we have to parse http.url attribute and extract path. This may be very expensive.

Third, should narrow this sampler only to SERVER spans? The original request described in the issue seems to assume this.

Fourth, if want to provide something more than just exact match, which pattern language should we use? Always assume startsWith? ant-style path matching? Regex?

@jkwatson
Copy link
Contributor

jkwatson commented Aug 5, 2021

First, which semantic attribute should we match against? http.url or http.route. The latter is not always available, the first has some problems, see below.

I think we have to use http.url, at least as a fallback. It might be faster to check the route first, then fall back to the url. if something has already parsed out that route, we can skip re-parsing it.

Second, should we match the whole url in http.url or only path portion? If I configure SDK manually inside my application's code, then I don't know the hostname that my application will respond to. Thus it seems that we cannot match full url, only path portion. But this means that on every call to shouldSample we have to parse http.url attribute and extract path. This may be very expensive.

Has to be just the path portion, because the host/port will be all over the place. It might be worth benchmark parsing of the URL vs. just using regex on the full url, to see which is faster/fewer allocations.

Third, should narrow this sampler only to SERVER spans? The original request described in the issue seems to assume this.

I think it should, yes. It will be a very fast enum comparison to check against it. Digging into attributes will be slower.

Fourth, if want to provide something more than just exact match, which pattern language should we use? Always assume startsWith? ant-style path matching? Regex?

My gut tells me regex, but I'd like to know how fast we can make the ant-style path patching, compared to precompiled regexes.

@anuraaga
Copy link
Contributor

anuraaga commented Aug 6, 2021

First, which semantic attribute should we match against? http.url or http.route. The latter is not always available, the first has some problems, see below.

I feel that we can avoid route entirely given we're configuring regexes in the rules anyways. We already have enough issues with the URL vs target, etc issue to add the sometimes but not always available route to the mix :P

Second, should we match the whole url in http.url or only path portion? If I configure SDK manually inside my application's code, then I don't know the hostname that my application will respond to. Thus it seems that we cannot match full url, only path portion. But this means that on every call to shouldSample we have to parse http.url attribute and extract path. This may be very expensive.

Parsing out an entire URL is definitely expensive, I think just finding the target can be made very cheap, just need to count slashes basically.

As for URL vs path, URL is more useful I guess since someone may need to match against host (sample all requests to some external API in a particular way would need the host I guess) - in practice it's rare though so only path could be OK (it solves the highly demanded health check sampling). I guess we could either

  1. Support full URL. Reconstruct URL from triplet if triplet is all available and url isn't. If triplet is only partially filled, ignore

  2. Support path. Check for target, if not available check for url.

  3. is simpler logic but not as flexible as allowing a URL to match against URL.

We have similar discussion in the instrumentation repo and it really seems like supporting both URL and triplet causes a lot of complexity. Perhaps this sampler is also a motivation to revisit the spec and consider dropping one of these - it's basically spaghetti within the spec, and we may have anticipated it only extending to spaghetti in backends, but I'm starting to see it affect instrumentation and SDK plugins like this one too, potential performance wins from having the choice of a triplet seems to not be worth it potentially.

My gut tells me regex, but I'd like to know how fast we can make the ant-style path patching, compared to precompiled regexes.

I've seen ant-style path matching in libraries like Spring implemented by translating to regex - so that's what I did too since X-Ray uses glob patterns

https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/SamplingRuleApplier.java#L344

From what I understand a version of Java improved regex implementation such that the regex equivalent of a glob pattern performs about the same as dynamic programming instead of potentially going exponential. The advantage of the glob pattern is probably more that it prevents being able to define patterns with random regex syntax like captures / backreferences which can hurt.

Third, should narrow this sampler only to SERVER spans? The original request described in the issue seems to assume this.

While having a configuration to limit to kinds would be fine, I don't think the default to limit to any kind. There's enough use case to filter on client (external API requests, client-side-load-balancing clients which issue health checks).

@iNikem
Copy link
Contributor Author

iNikem commented Aug 6, 2021

While having a configuration to limit to kinds would be fine, I don't think the default to limit to any kind. There's enough use case to filter on client (external API requests, client-side-load-balancing clients which issue health checks).

Even in the case when any given deployment wants to have both server- and client-side calls sampled I would expect them to have different patterns. Thus every given instance of url-based sampler will probably work with only one kind.

@anuraaga
Copy link
Contributor

anuraaga commented Aug 6, 2021

Thus every given instance of url-based sampler will probably work with only one kind.

I don't know how common it is, but there is a pattern of using client side health checks with load balancing. It means that filtering on the standard path like /healthcheck will work great across all servers and clients. I think semantics-wise applying a path filter on both client and server is correct - we see server side filtering of health check being common because k8s isn't traced with Otel but if it was, it would also filter the same. All kinds are "supposed to" behave the same for this special path. So I'd generally expect our default to go broad and if users want it as a performance optimization they could specify a kind to filter on.

@iNikem
Copy link
Contributor Author

iNikem commented Aug 10, 2021

@anuraaga I am still thinking about filtering on span kind. But your examples for client side sampling is still about path, not host+path. So to be sure: do you want to match full url or just path part of it?

@anuraaga
Copy link
Contributor

@iNikem I think path is much more common - but I still think URL is more useful in general since I can think of plenty of URLs where the host part does have meaning since the host often indicates a type of API.

Though realized, why do we make it a url-based sampler anyways - how about AttributeSampler which accepts a map from attribute key to patterns? Even for the path, or url, use case, it doesn't seem much harder to use than one specialized to URLs. The X-Ray sampler supports this if it's a useful reference point (it also natively supports some of the attributes only because X-Ray data model has native fields for them and does not populate them to attributes - but I don't think ours needs that).

https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/aws-xray/src/main/java/io/opentelemetry/contrib/awsxray/SamplingRuleApplier.java#L97

@iNikem
Copy link
Contributor Author

iNikem commented Aug 10, 2021

Though realized, why do we make it a url-based sampler anyways - how about AttributeSampler which accepts a map from attribute key to patterns?

This does not change my question in any way. What I am trying to understand now is this: do we expect an end-user to provide "/healthcheck" or "http://example.com/healtcheck" or ".*/healtcheck" as the configuration for this sampler? Path or full url?

@anuraaga
Copy link
Contributor

It means that both something like .deny(SemanticAttributes.TARGET, "/healthcheck") and .deny(SemanticAttributes.URL, ".*/healthcheck") would work fine. To add to our documentation in the FAQ of "how to disable health checks", we would need to resolve open-telemetry/opentelemetry-java-instrumentation#3700 but it becomes decoupled from the sampler implementation, which is able to satisfy this use case, or any other attribute based use case. Does that work?

@iNikem
Copy link
Contributor Author

iNikem commented Aug 11, 2021

It means that both something like .deny(SemanticAttributes.TARGET, "/healthcheck") and .deny(SemanticAttributes.URL, ".*/healthcheck") would work fine. To add to our documentation in the FAQ of "how to disable health checks", we would need to resolve open-telemetry/opentelemetry-java-instrumentation#3700 but it becomes decoupled from the sampler implementation, which is able to satisfy this use case, or any other attribute based use case. Does that work?

I will try this out, thanks for the suggestion :)

@iNikem
Copy link
Contributor Author

iNikem commented Aug 11, 2021

@anuraaga PTAL

Several open questions:

  • Should we do something to ignore query string and fragment? Or should we match the whole value of HTTP_URL?
  • In case of HTTP_URL should .*/healthcheck match http://healthcheck?
  • In case of HTTP_URL should .*/actuator match http://example.com/context/actuator?

@anuraaga
Copy link
Contributor

I would suggest the same for all of them - don't do anything special, so the pattern is always just applied to the value of the key. It's easy to add .* where something like query should be skipped and I think it's natural. And while I do see the trickiness of accidentally matching http://healthcheck in the second example if the intention is path, at the same time it seems to still be the most straight forward behavior I think.

@iNikem
Copy link
Contributor Author

iNikem commented Aug 11, 2021

The only remaining concern of mine is that I really would like to preserve p -> p.endsWith("$") ? p : (p + ".*"). So that users don't have to always add ".*" at the end of their patterns. WDYT @anuraaga @jkwatson

@iNikem iNikem marked this pull request as ready for review August 11, 2021 16:21
@iNikem iNikem requested a review from a team August 11, 2021 16:21
@iNikem
Copy link
Contributor Author

iNikem commented Aug 11, 2021

Also would love to hear suggestions for module name.

private final SpanKind kind;
private final Sampler delegate;

public StringAttributeSampler(Map<AttributeKey<String>, ? extends Collection<String>> patterns, SpanKind kind, Sampler delegate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may as well keep the delegate sampler together with the pattern, UX should be straight forward with a builder StringAttributeSamplerBuilder.addRule(key, pattern, delegate). Easy and opens up easy setting of different sampling rates for different endpoints.

Also careful with Map since order is important, both to determine precedence and to allow a user to optimize performance by (if they want) by having common rules up front. List<SamplingRule> for example could be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may as well keep the delegate sampler together with the pattern, UX should be straight forward with a builder StringAttributeSamplerBuilder.addRule(key, pattern, delegate). Easy and opens up easy setting of different sampling rates for different endpoints.

You are making it more and more generic/complex :) I usually prefer to start with as simple as possible and then iterate based on the user feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more generic, but is it really more complex? I think "setting a delegate Sampler for a pattern" is about as simple as it gets conceptually, it's almost like just populating a key/value set at that point. And while healthcheck is probably the #1 ask, it is common for systems to have APIs with vastly different QPS, so we can expect the new #1 ask to immediately want to be able to tweak their sampling rate :) I suspect it is worth working on this expectation, it doesn't seem to me to make things overly complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. I realised that your suggestion is totally different from my original idea. I wanted to have "deny list": if attribute matches one of the patterns, then drop, otherwise delegate to the "main" sampler. E.g. my main sampler for the whole application is ParentBased, but I want to drop "/healthcheck". Deny list.

Your idea is more like "router": match some patterns to samplers. Which means that I have to provides AlwaysOff sampler for "/healthcheck" pattern and I have to remember to provide "catchAll" pattern to direct to ParentBased sampler. Router.

We probably may want both samplers, one simpler (including simpler to think about), one more powerful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, doesn't it sort of make sense in this case to start with the more powerful one? I feel it will be a more productive discussion when bringing it to the spec, to talk about the generic sampling router vs the specific use case of denying URLs.

This is also because methods on SamplerBuilder seem like they can make up the difference between the two, e.g. deny(key, pattern) { addRule(key, pattern, Sampler.alwaysOff()) } is a simple shortcut that wouldn't require a totally separate Sampler implementation.

@@ -22,3 +22,4 @@ include(":aws-xray")
include(":dependencyManagement")
include(":example")
include(":jmx-metrics")
include(":url-sampler")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since package is samplers this should also be samplers I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was to have every single contrib sampler in a separate module. They then can have separate lifecycles and maintainers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - need to imagine more what other samplers there could be. Presumably you mean different modules even for samplers that aren't vendor-specific. Perhaps a rate-limiting sampler. Realistically, would these sampler primitives really have different maintainers? I figured this is prototyping with the intention of moving into the SDK samplers package eventually as these seem pretty "core" for lack of a better term for me. Though then I guess it brings the idea of just using the tracing-incubator too - https://github.com/open-telemetry/opentelemetry-java/tree/main/sdk-extensions/tracing-incubator.

I'm -1 to the idea of this or similar important Samplers being in contrib long term - they seem like tier 1 components of the SDK to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkwatson WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this at SIG and agree that it's very likely this sort of sampler will end up in the core SDK, after this is a PoC, then spec discussion. So tracing-incubator is also a fine place to put this sampler and a great use case for that package.

To the initial point of the thread, though, I think we can assume the maintainers for this sampler are the Java maintainers, and that this artifact is temporary as a PoC, so we wouldn't need to generalize our approach to "every single contrib sampler", as we should think of this not as a "contrib sampler" but as a PoC for a "core sampler". @iNikem I guess you can pick whether you think it's easier to work on this in tracing-incubator or a new artifact in this contrib repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is officially in the spec, then it should absolutely be in the core repo. Until then, tracing-incubator is acceptable if contrib doesn't work for you.

I'd like to think about having a formal decision-making process on this kind of thing.

Perhaps:

  1. If we think a new component should be a core component, create a spec issue for it, and do prototyping in the tracing-incubator (eventually metrics-incubator, or some other clearly -alpha component in the core repo). If/when the spec issue gets approved, then it can be moved to a normal "extension" or "sdk-extension" artifact, as appropriate.
  2. If we don't think it's going to be a core component, then put it into contrib and make sure it has an owner.

The other question is...how do we prune out failed incubator experiments, or unloved contrib components?

url-sampler/build.gradle.kts Show resolved Hide resolved
}

tasks {
shadowJar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to shadow at least at this layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? If I want to grab this sampler and add it as an extension to my agent, then I need a full shadowed jar, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we definitely don't want to shade in the SDK as the agent distro will be bringing it in. This sampler doesn't seem to have any dependencies that we want to shade.

Anyways in your issue you also mention "At this moment I don't propose nor plan any auto-configuration support for this sampler." :P Let's just implement this as we would a normal Java library for now.

url-sampler/build.gradle.kts Show resolved Hide resolved
url-sampler/build.gradle.kts Show resolved Hide resolved
private final SpanKind kind;
private final Sampler delegate;

public RuleBasedRoutingSampler(List<SamplingRule> rules, SpanKind kind, Sampler defaultDelegate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this package private and add builder(SpanKind, Sampler) static method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to hide this constructor? There is no actual requirement to use builder, isn't there?

Copy link
Contributor

@anuraaga anuraaga Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed commenting on this. For libraries we expose as little API as possible to reduce surface. It there is a useful shortcut we could expose it but this doesn't seem like a shortcut compared to the builder. Instead, we can even hide SamplingRule itself from the API too if only having the builder.

private final SpanKind kind;
private final Sampler defaultDelegate;

public RuleBasedRoutingSamplerBuilder(SpanKind kind, Sampler defaultDelegate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make package private

/**
* @see RuleBasedRoutingSampler
*/
public class SamplingRule {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like at least for now doesn't need to be public

@iNikem
Copy link
Contributor Author

iNikem commented Aug 26, 2021

Replaced by #70

@iNikem iNikem closed this Aug 26, 2021
@anuraaga anuraaga deleted the url-sampler branch January 17, 2022 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

http.url based Sampler
3 participants