-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsamplingprocessor] and
support for string invert match
#9768
Conversation
@@ -45,7 +45,7 @@ func (c *And) Evaluate(traceID pcommon.TraceID, trace *TraceData) (Decision, err | |||
if err != nil { | |||
return Unspecified, err | |||
} | |||
if decision == NotSampled { | |||
if decision == NotSampled || decision == InvertNotSampled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find, now I'm wondering if other places which evaluate decision should be updated as well. Quickly looking at the code, this seems to be the case (though to be fair I haven't looked at tailsamplingprocessor code in a while and might be missing something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit I haven't dived deep into the string matching, but it did make me wonder why it matters if it's Sampled
or InvertSampled
. It wasn't immediately obvious why a regular decision and an invert decision would need to be treated differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it's the case as well, I tested it and it failed. I am opening second PR right now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tillig I don't fully understand how the invert works but it looks like there's something related to the InvertSampled
decision in the makeDecision function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but even that seems really sort of weird to me.
// InvertNotSampled takes precedence over any other decision
if samplingDecision[sampling.InvertNotSampled] {
finalDecision = sampling.NotSampled
} else if samplingDecision[sampling.Sampled] {
finalDecision = sampling.Sampled
} else if samplingDecision[sampling.InvertSampled] && !samplingDecision[sampling.NotSampled] {
finalDecision = sampling.Sampled
}
- All the policies get executed before a decision is made. That's weird, and sounds like it's more of a composite thing than it should be. It feels like the policies should be executed in order and the first one that says "sample this thing" would allow sampling through. That'd perform better, too.
InvertNotSampled
has the highest precedence, butInvertSampled
has the lowest precedence and has to be specifically checked to see if it's also not contradicted byNotSampled
.
I dunno, I'm coming at it late, I just don't see any doc on it and the special treatment there is kind of confusing. It gets even more confusing when I think about how this affects the way I might get an unexpected result from just a series of policies now. If I put an invert match policy in the mix and it says "no," it'll countermand any other policy I put in there. Even if it's not in an and
policy. 🤔 Am I reading that wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's strange and basing on my understand how tail-sampling processor is supposed to work, we should prioritise positive sampling decisions, since ultimately the policies are "or"-ed (unless and
or composite
operator/policy is used)
BTW, we had several discussion on how tail-sampling should be arranged. I was proposing in the past the adoption of the sampler I made (renamed it to cascadingfilterprocessor). It does a couple more things and has (I believe) a much simpler configuration. I am considering refreshing the idea of upstreaming it again. Would you find it useful @tillig ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to get those design discussions on tail sampling codified somehow into the README or as inline code comments to explain why this does what it does, not just how (which is all the code explains). Or even just links to the discussion threads so folks like me could catch up.
I think cascadingfilterprocessor looks interesting. It shows doing exactly what I'm trying to do, which is:
- Keep long running ops
- Keep errors
- Drop all noise (health, metrics)
- Keep a small percentage of the remaining traces
But... that container doesn't have the dynatrace
exporter in it, which I need, so it's a non-starter; and at the moment I'm mostly in POC-land trying to show that OpenTelemetry works (yeah, that's another issue entirely) so I don't have the wherewithal to build a custom Frankenstein container to make it happen. If the cascadingfilterprocessor was in here, I'd be all over trying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I was curious if you would find that useful. As for trying it out, it should be quite easy to include it via a builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. If/when that time comes, I'll have to actually start learning what "include it via a builder" means. I'm not a go dev and there's a long tail of things to learn. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with that, the behavior of the tailsamplingprocessor
is not what i would expect.
the readme should definitely be updated to reflect that.
I know there was talk about deprecating the tailsamplingprocessor
so I like the idea of upstreaming cascadingfilterprocessor I think it can be a great fit to replace it.
It appears some sort of ballast load test failed here, but I'm unclear how adding this one "if" check would have caused that. Is there something different I should be doing? |
The load tests have been very flaky recently. The issue is likely not caused by your change. @djaglowski is working on a change to help fix the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I have prepared a PR with a similar fix to composite policy: #9793
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
The PR here is still very much of interest to me. It looks like the unit tests failed with some things unrelated to my change (errors about trying to use a plugin that's not installed?) and the load tests have been noted as flaky and getting a fix. If there's something I need to do here, please do let me know. I have to admit I'm somewhat new to Go so I'm not sure how to help fix the unrelated issues, but maybe with a little hint to get me going in the right direction? |
@tillig I've removed the Stale label and restarted unit tests. Assuming no persistent issues there, we'll get this merged. |
It appears the tests failing have to do with this test in the |
Merged main in here. Someone will have to kick off the build again to see if those tests start passing. |
Looks like unrelated link checks were failing due to the README in unrelated modules. Brought in additional changes from main in the hopes it'll get me over the hump. (Should I stop bringing in main?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Seems worth noting in the changelog though.
@djaglowski Added. Thanks! |
…ch (open-telemetry#9768) * [processor/tailsamplingprocessor] and support for string invert * Add open-telemetry#9553 to changelog.
This fixes the handling of AND policies that contain a sub-policy with invert_match=true. Previously if the decision from a policy evaluation was NotSampled or InvertNotSampled it would return a NotSampled decision regardless, effectively downgrading the result. This was breaking the documented behaviour that inverted decisions should take precedence over all others. This is related to the changes made in open-telemetry#9768 that introduced support for using invert_match within and sub policies.
…hen inside and sub policy (#33671) **Description:** This fixes the handling of AND policies that contain a sub-policy with invert_match=true. Previously if the decision from a policy evaluation was `NotSampled` or `InvertNotSampled` it would return a `NotSampled` decision regardless, effectively downgrading the result. This was breaking the documented behaviour that inverted decisions should take precedence over all others. This is related to the changes made in #9768 that introduced support for using invert_match within and sub policies. **Link to tracking Issue:** #33656 **Testing:** I tested manually that this fixes the issue described in #33656 and also updated the tests. If you have any suggestions for more tests we could add let me know. **Documentation:**
Description: Fix #9553: Adds support for tail sampling so the
and
policy can do string invert match.Link to tracking Issue: #9553
Testing: Added unit tests to verify string attribute invert could both allow and disallow match.
Documentation: No documentation added. Documentation already notes string invert match usage should work, it just hadn't been implemented with
and
.I didn't refactor the unit tests to be parameterized like those in
string_tag_filter_test.go
; unclear if that's required here.