Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability sampling specification #1899

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ release.
### Traces

- Adding SDK configuration for Jaeger remote sampler ([#1791](https://github.com/open-telemetry/opentelemetry-specification/pull/1791))
- Adds specification for probability sampling, updates the definition of the built-in Samplers.

### Metrics

Expand Down
244 changes: 203 additions & 41 deletions specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,15 +119,18 @@ Thus, the SDK specification defines sets of possible requirements for

## Sampling

Sampling is a mechanism to control the noise and overhead introduced by
OpenTelemetry by reducing the number of samples of traces collected and sent to
the backend.
Sampling is a mechanism to control the cost of OpenTelemetry tracing by
reducing the number of spans collected and sent to the backend.
jmacd marked this conversation as resolved.
Show resolved Hide resolved
OpenTelemetry gives participants in a distributed trace the option to sample
jmacd marked this conversation as resolved.
Show resolved Hide resolved
based on whether their context was sampled or to make the decision
independently.

Sampling may be implemented on different stages of a trace collection. The
earliest sampling could happen before the trace is actually created, and the
latest sampling could happen on the Collector which is out of process.
Sampling may be implemented in different stages of trace collection. The
earliest sampling happens inside the trace SDK, but later sampling decisions
can happen outside the process, for example in an OpenTelemetry collector.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

The OpenTelemetry API has two properties responsible for the data collection:
The OpenTelemetry API has two properties associated with Spans that allow the
user to become aware of the sampling decision:

* `IsRecording` field of a `Span`. If `false` the current `Span` discards all
tracing data (attributes, events, status, etc.). Users can use this property
Expand Down Expand Up @@ -165,6 +168,59 @@ The following table summarizes the expected behavior for each combination of
The SDK defines the interface [`Sampler`](#sampler) as well as a set of
[built-in samplers](#built-in-samplers) and associates a `Sampler` with each [`TracerProvider`].

### Probability sampling

OpenTelemetry specifies a mechanism for conveying sampling probability, when
jmacd marked this conversation as resolved.
Show resolved Hide resolved
known, to enable accurate statistical counting of the population of spans
using only the portion that were sampled and collected.

Samping probability is conveyed in a form known as "adjusted count", which
is the number of spans in the population accurately represented by the
individual. In common terms, a "1-in-N" sampling scheme produces spans with
adjusted count N, where every sample span represents N in the general
jmacd marked this conversation as resolved.
Show resolved Hide resolved
population. Adjusted count is the inverse of sampling probability except
when the probability is zero, which is an important special case.

Although sampling can be carried out in multiple stages, OpenTelemetry
specifies a dedicated field in the Span data model for representing
probability at the "head" of the distributed trace, where it describes the
probability the `Sampled` flag was set in the Span's initial sampling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probability the Sampled

probability when the Sampled?

decision.

To reduce the cost of conveying Sampling information through propagators,
OpenTelemetry limits head sampling probabilities to powers of two or zero.
Adjusted counts are likewise limited to powers of two or zero. As not all
Sampler implementations will be probabilistic in nature, a special value for
the "unknown" adjusted count is included in the data model. Aside from the
zero and unknown cases, adjusted count values can be encoded using their
base-2 logarithm in a small number of bits.

The OpenTelemetry Span field for encoding adjusted count is named
`log_head_adjusted_count`, with the default value zero representing the case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this defined in the data-model? Why is this needed if the tracestate is already present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Remember this PR is a draft until all the OTEPs merge.)

The field log_head_adjusted_count is described in https://github.com/open-telemetry/oteps/blob/main/text/trace/0170-sampling-probability.md.

You make a valid point about tracestate. On the other hand, it's less clearly part of the data model if the user is required to parse the tracestate (twice, as there are two syntaxes) to recover the information. Besides, tracestate is not an ideal solution in the long term. Our best long term solution is to modify the W3C traceparent to include a few extra bytes. Then, the rationale of storing log_head_adjusted_count as an independent field is that we may have other means of determining log_head_adjusted_count in the future.

For now, I would suggest that OTel erases its own tracestate record from the span data and specify fields for anything that's part of the data model. As it stands, the r value from OTEP 168 is not being recorded and as far as I know, no one has asked for that.

of unknown adjusted count. Values 1 through 62 encode one plus the base-2
logarithm of adjusted count (i.e., adjusted count equals
`2**(encoded_value-1)`), and zero adjusted count is encoded by value 63.
Thus, it takes 6 bits to encode the `log_head_adjusted_count` field, with
interpretation specified in the table below.

<a name="log-head-adjusted-count-table"></a>

| Encoded Value | Head adjusted count | Head sampling probability |
| ------------- | ------------------- | ------------------------- |
| 0 | Unknown | Unknown |
| 1 | 1 | 1 |
| 2 | 2 | 1/2 |
| 3 | 4 | 1/4 |
| 4 | 8 | 1/8 |
| ... | ... | ... |
| N | 2**(N-1) | 2**(-N+1) |
| ... | ... | ... |
| 62 | 2**61 | 2**-61 |
| 63 | 0 | 0 |

The built-in Samplers are defined so that when this specification is
followed, all spans will have known head sampling probability.

### SDK Span creation

When asked to create a Span, the SDK MUST act as if doing the following in order:
Expand All @@ -175,7 +231,7 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
2. Query the `Sampler`'s [`ShouldSample`](#shouldsample) method
(Note that the [built-in `ParentBasedSampler`](#parentbased) can be used to
use the sampling decision of the parent,
translating a set SampledFlag to RECORD and an unset one to DROP).
translating a set SampledFlag to `RECORD_AND_SAMPLE` and an unset one to `DROP`).
3. Generate a new span ID for the `Span`, independently of the sampling decision.
This is done so other components (such as logs or exception handling) can rely on
a unique span ID, even if the `Span` is a non-recording instance.
Expand Down Expand Up @@ -227,12 +283,22 @@ object must be immutable (multiple calls may return different immutable objects)
If the sampler returns an empty `Tracestate` here, the `Tracestate` will be cleared,
so samplers SHOULD normally return the passed-in `Tracestate` if they do not intend
to change it.
* The value of `log_head_adjusted_count` for the Span data model, reflecting
known and unknown values for adjusted count defined above. The following guidance
is given for user-defined Samplers:
* If the Sampler can be implemented as choice of built-in Sampler behavior
made at runtime, the resulting _composite_ sampler gives the correct
behavior.
* `RECORD_ONLY` and `DROP` sampling decisions indicate zero adjusted count. It is
an error to return a `RECORD_ONLY` or `DROP` decision with non-zero adjusted count.
* `RECORD_AND_SAMPLE` sampling decisions that cannot be defined in terms of
a built-in Sampler SHOULD use unknown adjusted count.

#### GetDescription

Returns the sampler name or short description with the configuration. This may
be displayed on debug pages or in the logs. Example:
`"TraceIdRatioBased{0.000100}"`.
`"TraceIdRatioBased{.000122}"`.

Description MUST NOT change over time and caller can cache the returned value.

Expand All @@ -245,48 +311,138 @@ The default sampler is `ParentBased(root=AlwaysOn)`.

* Returns `RECORD_AND_SAMPLE` always.
* Description MUST be `AlwaysOnSampler`.
* Behavior of `ShouldSample` is identical to `TraceIdRatioBased{1}`

#### AlwaysOff

* Returns `DROP` always.
* Description MUST be `AlwaysOffSampler`.
* Behavior of `ShouldSample` is identical to `TraceIdRatioBased{0}`

#### TraceIdRatioBased

* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
This Sampler supports making independent sampling decisions at each Span in
a trace, making it possible for each SDK to independently control its rate
of sample spans. The decision is defined to be consistent, such that for
any two spans `T` and `U` of the same trace configured with
`TraceIDRatioBased` ratios `ratio_T` and `ratio_U`, respectively, if
`ratio_T <= ratio_U` and span `T` was sampled, span `U` MUST also be
sampled.

The constructor for `TraceIDRatioBased` MUST accept ratios in the range 0
through 1, inclusive. When the input ratio is not a power of two exactly,
jmacd marked this conversation as resolved.
Show resolved Hide resolved
the Sampler MUST round the ratio down to the nearest power of two.
jmacd marked this conversation as resolved.
Show resolved Hide resolved

The `TraceIdRatioBased` Sampler MUST ignore the parent `SampledFlag`. To respect the
parent `SampledFlag`, the `TraceIdRatioBased` should be used as a delegate of
the `ParentBased` sampler specified below.
* Description MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
with `RATIO` replaced with the Sampler instance's trace sampling ratio
represented as a decimal number. The precision of the number SHOULD follow
implementation language standards and SHOULD be high enough to identify when
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
had a sampling ratio of 1 to every 10,000 spans it COULD return
`"TraceIdRatioBased{0.000100}"` as its description.

TODO: Add details about how the `TraceIdRatioBased` is implemented as a function
of the `TraceID`. [#1413](https://github.com/open-telemetry/opentelemetry-specification/issues/1413)

##### Requirements for `TraceIdRatioBased` sampler algorithm

* The sampling algorithm MUST be deterministic. A trace identified by a given
`TraceId` is sampled or not independent of language, time, etc. To achieve this,
implementations MUST use a deterministic hash of the `TraceId` when computing
the sampling decision. By ensuring this, running the sampler on any child `Span`
will produce the same decision.
* A `TraceIdRatioBased` sampler with a given sampling rate MUST also sample all
traces that any `TraceIdRatioBased` sampler with a lower sampling rate would
sample. This is important when a backend system may want to run with a higher
sampling rate than the frontend system, this way all frontend traces will
still be sampled and extra traces will be sampled on the backend only.
* **WARNING:** Since the exact algorithm is not specified yet (see TODO above),
there will probably be changes to it in any language SDK once it is, which
would break code that relies on the algorithm results.
Only the configuration and creation APIs can be considered stable.
It is recommended to use this sampler algorithm only for root spans
(in combination with [`ParentBased`](#parentbased)) because different language
SDKs or even different versions of the same language SDKs may produce inconsistent
results for the same input.

The description MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
with `RATIO` replaced with the Sampler instance's trace sampling ratio
represented as a decimal number. The precision of the number SHOULD follow
implementation language standards and SHOULD be high enough to identify when
Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
had a sampling ratio of 1-in-(2**13) spans it COULD return
`"TraceIdRatioBased{0.000122}"` as its description.

##### `TraceIdRatioBased` sampler behavior

Given the restriction to power-of-two sampling probabilities stated above
and a uniform random distribution of `TraceId` bits, it would be possible to
define the `TraceIdRatioBased` Sampler behavior in terms of the number of
leading zeros of the TraceID. If this were the case, sampling with
probability `2**(-k)` means sampling all traces with `TraceId` having at
least `k` leading zeros.

OpenTelemetry follows the W3C specification for Trace Context, which does
not require a uniform random distribution of `TraceId` bits, therefore the
built-in OpenTelemetry Samplers are defined to propagate additional bits of
randomness to facilitate consistent sampling.

To convey the head sampling probability, so that child `ParentBased`
Samplers can accurately record their adjusted count, the `TraceIDRatioBased`
Sampler is required to propagate its adjusted count via W3C `tracestate`
when that form of trace context propagation is used. OpenTelemetry
specifies the following syntax for propagating two values known as `p` for
probability and `r` for randomness, for example:

```
tracestate: otel=p:PP;r:RR
```

where `PP` are two base16 encoded bytes conveying 6 bits of probability and
`RR` are two base16 encoded bytes conveying 6 bits of randomness. [TODO:
elsewhere and in a different PR, write the syntax and rules for interpreting
OpenTelemetry `tracestate`. In both cases, values for `p` and `r` are
restricted to the range 0 through 63 inclusive.](TODO)

The value for `p` corresponds exactly with the encoded value for
`log_head_adjusted_count` of the parent span, with values 1 through 62
representing known non-zero adjusted counts.

The value for `r` MUST be generated at the root of the trace, which MUST use
a `TraceIdRatioBased` sampler so that its child `TraceIdRatioBased` Samplers
know their adjusted counts.

The `ShouldSample()` implementation considers two cases, root and non-root
decisions.

###### TraceIdRatioBased: Root case

The value for `r` MUST be drawn from the following discrete probability
distribution:

| `r` Value | Likihood |
| --------- | --------- |
| 0 | 1/2 |
| 1 | 1/4 |
| 2 | 1/8 |
| 3 | 1/16 |
| ... | ... |
| N | 2**(-N-1) |
| ... | ... |
| 62 | 2**(-63) |
| 63 | Reject |

jmacd marked this conversation as resolved.
Show resolved Hide resolved
This value can be computed by counting the number of zeros in a string of
random bits limited to maximum value 62. Here is example code to compute a
value for `r`:

```golang
func nextRandomness() int {
// Repeat until a valid result is produced.
for {
R := 0
for {
if nextRandomBit() {
jmacd marked this conversation as resolved.
Show resolved Hide resolved
break
}
R++
}
if R < 63 {
return R
}
jmacd marked this conversation as resolved.
Show resolved Hide resolved
// Reject, try again.
}
}
```

jmacd marked this conversation as resolved.
Show resolved Hide resolved
Given a randomness value `r` and a power-of-two `TraceIdRatioBased` sampling
ratio expressed as`2**(-s)`, the `TraceIdRatioBased` Sampler MUST make the
jmacd marked this conversation as resolved.
Show resolved Hide resolved
sampling decision `RECORD_AND_SAMPLE` if and only if `s <= r`.

###### TraceIdRatioBased: Non-root case

The `r` value should have been generated by the root `Sampler`. If no `r`
value was provided in the context, the Sampler SHOULD return an unknown
adjusted count as the `log_head_adjusted_count`. No `p` value should be
propagated to child contexts, in this case.

When `r` was properly generated by a root Sampler, the decision is the same
for the root case. Given a randomness value `r` and a power-of-two
`TraceIdRatioBased` sampling ratio expressed as`2**(-s)`, the
jmacd marked this conversation as resolved.
Show resolved Hide resolved
`TraceIdRatioBased` Sampler MUST make the sampling decision
`RECORD_AND_SAMPLE` if and only if `s <= r`.

#### ParentBased

Expand Down Expand Up @@ -317,6 +473,12 @@ Optional parameters:
|present|false|true|`localParentSampled()`|
|present|false|false|`localParentNotSampled()`|

The `ParentBased` Sampler computes the `log_head_adjusted_count` field as
follows:

* If a `p` value was generated by the parent Sampler or one of its ancestors, use it as the `log_head_adjusted_count`
* If no `p` value was generated by the parent Sampler or one of its ancestors, return an unknown `log_head_adjusted_count`.

## Span Limits

Erroneous code can add unintended attributes, events, and links to a span. If
Expand Down