Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'meta.refinery.original_sample_rate' #508

Merged
merged 5 commits into from
Sep 9, 2022

Conversation

epvanhouten
Copy link
Contributor

@epvanhouten epvanhouten commented Sep 8, 2022

Which problem is this PR solving?

Refinery re-writing the sample rate was causing some concern. Writing down the sample rate sent to refinery prior to updating the sample rate should allow for more/better debug.

This is implementing the suggestion of @robbkidd in #490 to add annotations to the spans of their original sample rate. This will allow for better understanding of what is happening when there are multiple tiers of sampling happening in the telemetry processing.

Short description of the changes

At the places where the Span SampleRate was combined with the Trace SampleRate, the original span SampleRate is copied down into a meta.refinery annotation.

Refinery re-writing the sample rate was causing some concern. Writing
down the sample rate sent to refinery prior to updating the sample rate
should allow for more/better debug.
@epvanhouten epvanhouten requested review from a team and robbkidd September 8, 2022 03:53
Wrap accessing transmission.Events for its count in Mux.Lock/Unlock.
Copy link
Contributor

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @epvanhouten 👍🏻

I think we still need to update the original sample rate if unset or negative, otherwise looks good 👍🏻

@MikeGoldsmith MikeGoldsmith self-assigned this Sep 8, 2022
@MikeGoldsmith MikeGoldsmith added type: enhancement New feature or request version: bump minor A PR that adds behavior, but is backwards-compatible. labels Sep 8, 2022
i.Transmission.EnqueueSpan(sp)
return
}
i.Logger.Debug().WithField("trace_id", sp.TraceID).Logf("Dropping span because of previous decision to drop trace")
}

func mergeTraceAndSpanSampleRates(sp *types.Span, traceSampleRate uint) {
if traceSampleRate != 1 {
Copy link
Contributor

@MikeGoldsmith MikeGoldsmith Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed adding it initial review, this is where I think we should default sample rate if < 1.

Suggested change
if traceSampleRate != 1 {
if sp.SampleRate < 1 {
sp.SampleRate = 1
}
sp.SampleRate *= sampleRate if traceSampleRate != 1 {
if traceSampleRate != 1 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated mergeTraceAndSpanSampleRates to set to 1 when something less than 1 and added a unit test to capture that behavior. I very briefly tried to find mention of the range of allowed values for SampleRate in documentation for OpenTelemetry but came up empty. Is there a source for that requirement so we can leave a breadcrumb of why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honeycomb's use and interpretation of SampleRate predates OTel. OTel uses a p-value, which is effectively the mathematical inverse of Honeycomb's SampleRate.

Honeycomb's backend data system interprets a missing value for SampleRate as a value of 0, so therefore converts those to 1 automatically so the math works correclty. We have been trying to push that idea upstream where we can; it benefits everyone to have it more explicit.

@MikeGoldsmith MikeGoldsmith added the status: revision needed Waiting for response to changes requested. label Sep 8, 2022
Add unit test to capture forcing SampleRate to be at least 1.
@epvanhouten
Copy link
Contributor Author

epvanhouten commented Sep 8, 2022 via email

transmission.Mux.RUnlock()
}

// Where is the documentation that this is needed behavior?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ... basically how Honeycomb works. When doing telemetry math in "normal" mode (as opposed to "usage" mode) we use SampleRate to generate graphs that try to reconstruct the original data. SampleRate of 0 or missing is interpreted as 1 in our backend in various ways that can sometimes be surprising, so we have found that it's better to be explicit on input if possible.

@kentquirk
Copy link
Contributor

kentquirk commented Sep 8, 2022

Yes, what you said is basically correct.

Honeycomb's sample rate is an indication that this entity is considered to be representative of a SampleRate number of entities that are "similar to" it based on the particular sampling method(s) in use.

When multiple samplers are involved, the best way to reconstruct the original shape of the data is to multiply SampleRate values (provided that there aren't any zeroes in the mix, which is why we promote things to 1 before the multiplication).

@epvanhouten epvanhouten requested review from MikeGoldsmith and removed request for robbkidd September 9, 2022 00:28
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent. Thank you for your contribution!

@kentquirk kentquirk merged commit a9ddbc2 into honeycombio:main Sep 9, 2022
ghost pushed a commit to opsramp/tracing-proxy that referenced this pull request Jul 5, 2024
## Which problem is this PR solving?

Refinery re-writing the sample rate was causing some concern. Writing down the sample rate sent to refinery prior to updating the sample rate should allow for more/better debug.

This is implementing the suggestion of @robbkidd in honeycombio#490 to add annotations to the spans of their original sample rate. This will allow for better understanding of what is happening when there are multiple tiers of sampling happening in the telemetry processing.

## Short description of the changes
At the places where the Span SampleRate was combined with the Trace SampleRate, the original span SampleRate is copied down into a meta.refinery annotation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: revision needed Waiting for response to changes requested. type: enhancement New feature or request version: bump minor A PR that adds behavior, but is backwards-compatible.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants