Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/tailsamplingprocessor] and support for string invert match #9768

Merged
merged 5 commits into from
May 26, 2022

Conversation

tillig
Copy link
Contributor

@tillig tillig commented May 6, 2022

Description: Fix #9553: Adds support for tail sampling so the and policy can do string invert match.

Link to tracking Issue: #9553

Testing: Added unit tests to verify string attribute invert could both allow and disallow match.

Documentation: No documentation added. Documentation already notes string invert match usage should work, it just hadn't been implemented with and.

I didn't refactor the unit tests to be parameterized like those in string_tag_filter_test.go; unclear if that's required here.

@pmm-sumo
Copy link
Contributor

pmm-sumo commented May 8, 2022

@mottibec @fabiovn since it involves the and policy and invert matches, could you take a look as well?

@@ -45,7 +45,7 @@ func (c *And) Evaluate(traceID pcommon.TraceID, trace *TraceData) (Decision, err
if err != nil {
return Unspecified, err
}
if decision == NotSampled {
if decision == NotSampled || decision == InvertNotSampled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, now I'm wondering if other places which evaluate decision should be updated as well. Quickly looking at the code, this seems to be the case (though to be fair I haven't looked at tailsamplingprocessor code in a while and might be missing something)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I haven't dived deep into the string matching, but it did make me wonder why it matters if it's Sampled or InvertSampled. It wasn't immediately obvious why a regular decision and an invert decision would need to be treated differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it's the case as well, I tested it and it failed. I am opening second PR right now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tillig I don't fully understand how the invert works but it looks like there's something related to the InvertSampled decision in the makeDecision function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but even that seems really sort of weird to me.

	// InvertNotSampled takes precedence over any other decision
	if samplingDecision[sampling.InvertNotSampled] {
		finalDecision = sampling.NotSampled
	} else if samplingDecision[sampling.Sampled] {
		finalDecision = sampling.Sampled
	} else if samplingDecision[sampling.InvertSampled] && !samplingDecision[sampling.NotSampled] {
		finalDecision = sampling.Sampled
	}
  • All the policies get executed before a decision is made. That's weird, and sounds like it's more of a composite thing than it should be. It feels like the policies should be executed in order and the first one that says "sample this thing" would allow sampling through. That'd perform better, too.
  • InvertNotSampled has the highest precedence, but InvertSampled has the lowest precedence and has to be specifically checked to see if it's also not contradicted by NotSampled.

I dunno, I'm coming at it late, I just don't see any doc on it and the special treatment there is kind of confusing. It gets even more confusing when I think about how this affects the way I might get an unexpected result from just a series of policies now. If I put an invert match policy in the mix and it says "no," it'll countermand any other policy I put in there. Even if it's not in an and policy. 🤔 Am I reading that wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's strange and basing on my understand how tail-sampling processor is supposed to work, we should prioritise positive sampling decisions, since ultimately the policies are "or"-ed (unless and or composite operator/policy is used)

BTW, we had several discussion on how tail-sampling should be arranged. I was proposing in the past the adoption of the sampler I made (renamed it to cascadingfilterprocessor). It does a couple more things and has (I believe) a much simpler configuration. I am considering refreshing the idea of upstreaming it again. Would you find it useful @tillig ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to get those design discussions on tail sampling codified somehow into the README or as inline code comments to explain why this does what it does, not just how (which is all the code explains). Or even just links to the discussion threads so folks like me could catch up.

I think cascadingfilterprocessor looks interesting. It shows doing exactly what I'm trying to do, which is:

  • Keep long running ops
  • Keep errors
  • Drop all noise (health, metrics)
  • Keep a small percentage of the remaining traces

But... that container doesn't have the dynatrace exporter in it, which I need, so it's a non-starter; and at the moment I'm mostly in POC-land trying to show that OpenTelemetry works (yeah, that's another issue entirely) so I don't have the wherewithal to build a custom Frankenstein container to make it happen. If the cascadingfilterprocessor was in here, I'd be all over trying it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I was curious if you would find that useful. As for trying it out, it should be quite easy to include it via a builder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. If/when that time comes, I'll have to actually start learning what "include it via a builder" means. I'm not a go dev and there's a long tail of things to learn. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with that, the behavior of the tailsamplingprocessor is not what i would expect.
the readme should definitely be updated to reflect that.
I know there was talk about deprecating the tailsamplingprocessor so I like the idea of upstreaming cascadingfilterprocessor I think it can be a great fit to replace it.

@tillig
Copy link
Contributor Author

tillig commented May 8, 2022

It appears some sort of ballast load test failed here, but I'm unclear how adding this one "if" check would have caused that. Is there something different I should be doing?

@TylerHelmuth
Copy link
Member

It appears some sort of ballast load test failed here, but I'm unclear how adding this one "if" check would have caused that. Is there something different I should be doing?

The load tests have been very flaky recently. The issue is likely not caused by your change. @djaglowski is working on a change to help fix the tests.

Copy link
Contributor

@pmm-sumo pmm-sumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I have prepared a PR with a similar fix to composite policy: #9793

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 26, 2022
@tillig
Copy link
Contributor Author

tillig commented May 26, 2022

The PR here is still very much of interest to me. It looks like the unit tests failed with some things unrelated to my change (errors about trying to use a plugin that's not installed?) and the load tests have been noted as flaky and getting a fix.

If there's something I need to do here, please do let me know. I have to admit I'm somewhat new to Go so I'm not sure how to help fix the unrelated issues, but maybe with a little hint to get me going in the right direction?

@djaglowski djaglowski removed the Stale label May 26, 2022
@djaglowski
Copy link
Member

@tillig I've removed the Stale label and restarted unit tests. Assuming no persistent issues there, we'll get this merged.

@tillig
Copy link
Contributor Author

tillig commented May 26, 2022

It appears the tests failing have to do with this test in the logtransformprocessor which isn't something I messed with. Let me see if I can pull changes from main to see if it's fixed there.

@tillig
Copy link
Contributor Author

tillig commented May 26, 2022

Merged main in here. Someone will have to kick off the build again to see if those tests start passing.

@tillig
Copy link
Contributor Author

tillig commented May 26, 2022

Looks like unrelated link checks were failing due to the README in unrelated modules. Brought in additional changes from main in the hopes it'll get me over the hump. (Should I stop bringing in main?)

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Seems worth noting in the changelog though.

@tillig
Copy link
Contributor Author

tillig commented May 26, 2022

@djaglowski Added. Thanks!

@djaglowski djaglowski merged commit d9e0d00 into open-telemetry:main May 26, 2022
@tillig tillig deleted the issue-9553 branch May 26, 2022 20:20
kentquirk pushed a commit to McSick/opentelemetry-collector-contrib that referenced this pull request Jun 14, 2022
…ch (open-telemetry#9768)

* [processor/tailsamplingprocessor] and support for string invert

* Add open-telemetry#9553 to changelog.
jamesrwhite added a commit to jamesrwhite/opentelemetry-collector-contrib that referenced this pull request Jun 18, 2024
This fixes the handling of AND policies that contain a sub-policy
with invert_match=true. Previously if the decision from a policy
evaluation was NotSampled or InvertNotSampled it would return a
NotSampled decision regardless, effectively downgrading the result.

This was breaking the documented behaviour that inverted decisions
should take precedence over all others.

This is related to the changes made in open-telemetry#9768
that introduced support for using invert_match within and sub policies.
jpkrohling pushed a commit that referenced this pull request Jun 20, 2024
…hen inside and sub policy (#33671)

**Description:**

This fixes the handling of AND policies that contain a sub-policy with
invert_match=true. Previously if the decision from a policy evaluation
was `NotSampled` or `InvertNotSampled` it would return a `NotSampled`
decision regardless, effectively downgrading the result.

This was breaking the documented behaviour that inverted decisions
should take precedence over all others.

This is related to the changes made in
#9768 that introduced
support for using invert_match within and sub policies.

**Link to tracking Issue:**
#33656

**Testing:**

I tested manually that this fixes the issue described in
#33656
and also updated the tests. If you have any suggestions for more tests
we could add let me know.

**Documentation:**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[processor/tailsamplingprocessor] and policy appears to ignore string_attribute match results
6 participants