Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability sampling specification #1899

Closed
wants to merge 9 commits into from

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Aug 30, 2021

This proposes changes in the specification implied by OTEPs 168, 170, and 175:

open-telemetry/oteps#168
open-telemetry/oteps#170
open-telemetry/oteps#175

The justification for these changes is given in those two OTEPs. Our goal is to ensure that Span-to-Metrics pipelines can be built to statistically count spans after they have been sampled and before they have been assembled into traces.

The two OTEPs are not receiving enough attention. This PR is meant to consolidate the implied changes of those two OTEPs combined in a way that makes it easier to see what is being specified (WHAT) without all the related justification (WHY). If you approve these changes, please also review and approve those OTEPs first.

Fixes #1413.
Part of #1819.

This cannot be merged before OTEPs 168 and 170 are merged, but this is also not a draft, it is a complete proposal. This has been prototyped twice, once in Java and once in Go. See OTEP 170 for more detail about these prototypes.

@jmacd jmacd requested review from a team August 30, 2021 23:29
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
Although sampling can be carried out in multiple stages, OpenTelemetry
specifies a dedicated field in the Span data model for representing
probability at the "head" of the distributed trace, where it describes the
probability the `Sampled` flag was set in the Span's initial sampling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probability the Sampled

probability when the Sampled?

specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
Copy link
Contributor

@tedsuo tedsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Yuri's suggestions are responded to, this LGTM

specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Show resolved Hide resolved
specification/trace/sdk.md Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
@jmacd
Copy link
Contributor Author

jmacd commented Sep 9, 2021

Reviewers, please consider another look at open-telemetry/oteps#168 which now covers a number of corner cases and is generally finished. Once that merges we can continue working on this PR. The questions that I propose we defer for "future debate" are summarized here in this comment open-telemetry/oteps#168 (comment) and I propose that we debate them here afterhttps://github.com/open-telemetry/oteps/pull/168 merges.

@jmacd
Copy link
Contributor Author

jmacd commented Sep 14, 2021

FYI open-telemetry/oteps#175 adds a glossary of sampling terms used that would be used to flesh out a tracing datamodel.md document. Unlike Metrics and Logs, Tracing does not have one of these.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth I thought that we don't immediately define specs for OTEPs, especially when we are in a new unknown area. Also this changes a "stable" document without having a prototype which we should not correct?

base-2 logarithm in a small number of bits.

The OpenTelemetry Span field for encoding adjusted count is named
`log_head_adjusted_count`, with the default value zero representing the case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this defined in the data-model? Why is this needed if the tracestate is already present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Remember this PR is a draft until all the OTEPs merge.)

The field log_head_adjusted_count is described in https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md.

You make a valid point about tracestate. On the other hand, it's less clearly part of the data model if the user is required to parse the tracestate (twice, as there are two syntaxes) to recover the information. Besides, tracestate is not an ideal solution in the long term. Our best long term solution is to modify the W3C traceparent to include a few extra bytes. Then, the rationale of storing log_head_adjusted_count as an independent field is that we may have other means of determining log_head_adjusted_count in the future.

For now, I would suggest that OTel erases its own tracestate record from the span data and specify fields for anything that's part of the data model. As it stands, the r value from OTEP 168 is not being recorded and as far as I know, no one has asked for that.

specification/trace/sdk.md Outdated Show resolved Hide resolved
specification/trace/sdk.md Outdated Show resolved Hide resolved
Co-authored-by: Reiley Yang <reyang@microsoft.com>
@github-actions
Copy link

github-actions bot commented Oct 1, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 1, 2021
@github-actions
Copy link

github-actions bot commented Oct 8, 2021

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review approach & specify algorithm for TraceIdRatioBasedSampler (ProbabilitySampler)
7 participants