-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'meta.refinery.original_sample_rate' #508
Conversation
Refinery re-writing the sample rate was causing some concern. Writing down the sample rate sent to refinery prior to updating the sample rate should allow for more/better debug.
Wrap accessing transmission.Events for its count in Mux.Lock/Unlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @epvanhouten 👍🏻
I think we still need to update the original sample rate if unset or negative, otherwise looks good 👍🏻
i.Transmission.EnqueueSpan(sp) | ||
return | ||
} | ||
i.Logger.Debug().WithField("trace_id", sp.TraceID).Logf("Dropping span because of previous decision to drop trace") | ||
} | ||
|
||
func mergeTraceAndSpanSampleRates(sp *types.Span, traceSampleRate uint) { | ||
if traceSampleRate != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed adding it initial review, this is where I think we should default sample rate if < 1.
if traceSampleRate != 1 { | |
if sp.SampleRate < 1 { | |
sp.SampleRate = 1 | |
} | |
sp.SampleRate *= sampleRate if traceSampleRate != 1 { | |
if traceSampleRate != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated mergeTraceAndSpanSampleRates
to set to 1 when something less than 1 and added a unit test to capture that behavior. I very briefly tried to find mention of the range of allowed values for SampleRate in documentation for OpenTelemetry but came up empty. Is there a source for that requirement so we can leave a breadcrumb of why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honeycomb's use and interpretation of SampleRate predates OTel. OTel uses a p-value, which is effectively the mathematical inverse of Honeycomb's SampleRate.
Honeycomb's backend data system interprets a missing value for SampleRate as a value of 0, so therefore converts those to 1 automatically so the math works correclty. We have been trying to push that idea upstream where we can; it benefits everyone to have it more explicit.
Add unit test to capture forcing SampleRate to be at least 1.
Gave me enough breadcrumbs to find...
https://opentelemetry.io/docs/reference/specification/trace/tracestate-probability-sampling/
https://docs.honeycomb.io/getting-data-in/opentelemetry/dotnet-distro/#sampling
So SampleRate is explicitly not an OTEL construct, but honeycombs
SampleRate which says probability of sample = 1/SampleRate where SampleRate
is 1 to max(uint)?
…On Thu, Sep 8, 2022 at 11:34 AM Kent Quirk ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In collect/collect.go
<#508 (comment)>:
> i.Transmission.EnqueueSpan(sp)
return
}
i.Logger.Debug().WithField("trace_id", sp.TraceID).Logf("Dropping span because of previous decision to drop trace")
}
+func mergeTraceAndSpanSampleRates(sp *types.Span, traceSampleRate uint) {
+ if traceSampleRate != 1 {
Honeycomb's use and interpretation of SampleRate predates OTel. OTel uses
a p-value, which is effectively the mathematical inverse of Honeycomb's
SampleRate.
Honeycomb's backend data system interprets a missing value for SampleRate
as a value of 0, so therefore converts those to 1 automatically so the math
works correclty. We have been trying to push that idea upstream where we
can; it benefits everyone to have it more explicit.
—
Reply to this email directly, view it on GitHub
<#508 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABL2DNYJVTSAHL7FHDZQONTV5IIQZANCNFSM6AAAAAAQHKIOFQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
collect/collect_test.go
Outdated
transmission.Mux.RUnlock() | ||
} | ||
|
||
// Where is the documentation that this is needed behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ... basically how Honeycomb works. When doing telemetry math in "normal" mode (as opposed to "usage" mode) we use SampleRate to generate graphs that try to reconstruct the original data. SampleRate of 0 or missing is interpreted as 1 in our backend in various ways that can sometimes be surprising, so we have found that it's better to be explicit on input if possible.
Yes, what you said is basically correct. Honeycomb's sample rate is an indication that this entity is considered to be representative of a SampleRate number of entities that are "similar to" it based on the particular sampling method(s) in use. When multiple samplers are involved, the best way to reconstruct the original shape of the data is to multiply SampleRate values (provided that there aren't any zeroes in the mix, which is why we promote things to 1 before the multiplication). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent. Thank you for your contribution!
## Which problem is this PR solving? Refinery re-writing the sample rate was causing some concern. Writing down the sample rate sent to refinery prior to updating the sample rate should allow for more/better debug. This is implementing the suggestion of @robbkidd in honeycombio#490 to add annotations to the spans of their original sample rate. This will allow for better understanding of what is happening when there are multiple tiers of sampling happening in the telemetry processing. ## Short description of the changes At the places where the Span SampleRate was combined with the Trace SampleRate, the original span SampleRate is copied down into a meta.refinery annotation.
Which problem is this PR solving?
Refinery re-writing the sample rate was causing some concern. Writing down the sample rate sent to refinery prior to updating the sample rate should allow for more/better debug.
This is implementing the suggestion of @robbkidd in #490 to add annotations to the spans of their original sample rate. This will allow for better understanding of what is happening when there are multiple tiers of sampling happening in the telemetry processing.
Short description of the changes
At the places where the Span SampleRate was combined with the Trace SampleRate, the original span SampleRate is copied down into a meta.refinery annotation.