open-telemetry · jmacd · Aug 30, 2021 · Aug 30, 2021 · Aug 30, 2021 · Sep 2, 2021
@@ -12,6 +12,7 @@ release.
 ### Traces
 
 - Adding SDK configuration for Jaeger remote sampler ([#1791](https://github.com/open-telemetry/opentelemetry-specification/pull/1791))
+- Adds specification for probability sampling, updates the definition of the built-in Samplers.
 
 ### Metrics
 

@@ -119,15 +119,18 @@ Thus, the SDK specification defines sets of possible requirements for
 
 ## Sampling
 
-Sampling is a mechanism to control the noise and overhead introduced by
-OpenTelemetry by reducing the number of samples of traces collected and sent to
-the backend.
+Sampling is a mechanism to control the cost of OpenTelemetry tracing by
+reducing the number of spans collected and sent to the backend.
+OpenTelemetry gives participants in a distributed trace the option to sample
+based on whether their context was sampled or to make the decision
+independently.
 
-Sampling may be implemented on different stages of a trace collection. The
-earliest sampling could happen before the trace is actually created, and the
-latest sampling could happen on the Collector which is out of process.
+Sampling may be implemented in different stages of trace collection. The
+earliest sampling happens inside the trace SDK, but later sampling decisions
+can happen outside the process, for example in an OpenTelemetry collector.
 
-The OpenTelemetry API has two properties responsible for the data collection:
+The OpenTelemetry API has two properties associated with Spans that allow the
+user to become aware of the sampling decision:
 
 * `IsRecording` field of a `Span`. If `false` the current `Span` discards all
   tracing data (attributes, events, status, etc.). Users can use this property
@@ -165,6 +168,59 @@ The following table summarizes the expected behavior for each combination of
 The SDK defines the interface [`Sampler`](#sampler) as well as a set of
 [built-in samplers](#built-in-samplers) and associates a `Sampler` with each [`TracerProvider`].
 
+### Probability sampling
+
+OpenTelemetry specifies a mechanism for conveying sampling probability, when
+known, to enable accurate statistical counting of the population of spans
+using only the portion that were sampled and collected.
+
+Samping probability is conveyed in a form known as "adjusted count", which
+is the number of spans in the population accurately represented by the
+individual.  In common terms, a "1-in-N" sampling scheme produces spans with
+adjusted count N, where every sample span represents N in the general
+population.  Adjusted count is the inverse of sampling probability except
+when the probability is zero, which is an important special case.
+
+Although sampling can be carried out in multiple stages, OpenTelemetry
+specifies a dedicated field in the Span data model for representing
+probability at the "head" of the distributed trace, where it describes the
+probability the `Sampled` flag was set in the Span's initial sampling
+decision.
+
+To reduce the cost of conveying Sampling information through propagators,
+OpenTelemetry limits head sampling probabilities to powers of two or zero.
+Adjusted counts are likewise limited to powers of two or zero.  As not all
+Sampler implementations will be probabilistic in nature, a special value for
+the "unknown" adjusted count is included in the data model.  Aside from the
+zero and unknown cases, adjusted count values can be encoded using their
+base-2 logarithm in a small number of bits.
+
+The OpenTelemetry Span field for encoding adjusted count is named
+`log_head_adjusted_count`, with the default value zero representing the case
+of unknown adjusted count.  Values 1 through 62 encode one plus the base-2
+logarithm of adjusted count (i.e., adjusted count equals
+`2**(encoded_value-1)`), and zero adjusted count is encoded by value 63.
+Thus, it takes 6 bits to encode the `log_head_adjusted_count` field, with
+interpretation specified in the table below.
+
+<a name="log-head-adjusted-count-table"></a>
+
+| Encoded Value | Head adjusted count | Head sampling probability |
+| ------------- | ------------------- | ------------------------- |
+| 0             | Unknown             | Unknown                   |
+| 1             | 1                   | 1                         |
+| 2             | 2                   | 1/2                       |
+| 3             | 4                   | 1/4                       |
+| 4             | 8                   | 1/8                       |
+| ...           | ...                 | ...                       |
+| N             | 2**(N-1)            | 2**(-N+1)                 |
+| ...           | ...                 | ...                       |
+| 62            | 2**61               | 2**-61                    |
+| 63            | 0                   | 0                         |
+
+The built-in Samplers are defined so that when this specification is
+followed, all spans will have known head sampling probability.
+
 ### SDK Span creation
 
 When asked to create a Span, the SDK MUST act as if doing the following in order:
@@ -175,7 +231,7 @@ When asked to create a Span, the SDK MUST act as if doing the following in order
 2. Query the `Sampler`'s [`ShouldSample`](#shouldsample) method
    (Note that the [built-in `ParentBasedSampler`](#parentbased) can be used to
    use the sampling decision of the parent,
-   translating a set SampledFlag to RECORD and an unset one to DROP).
+   translating a set SampledFlag to `RECORD_AND_SAMPLE` and an unset one to `DROP`).
 3. Generate a new span ID for the `Span`, independently of the sampling decision.
    This is done so other components (such as logs or exception handling) can rely on
    a unique span ID, even if the `Span` is a non-recording instance.
@@ -227,12 +283,22 @@ object must be immutable (multiple calls may return different immutable objects)
   If the sampler returns an empty `Tracestate` here, the `Tracestate` will be cleared,
   so samplers SHOULD normally return the passed-in `Tracestate` if they do not intend
   to change it.
+* The value of `log_head_adjusted_count` for the Span data model, reflecting
+  known and unknown values for adjusted count defined above.  The following guidance 
+  is given for user-defined Samplers:
+  * If the Sampler can be implemented as choice of built-in Sampler behavior
+    made at runtime, the resulting _composite_ sampler gives the correct
+    behavior.
+  * `RECORD_ONLY` and `DROP` sampling decisions indicate zero adjusted count.  It is
+    an error to return a `RECORD_ONLY` or `DROP` decision with non-zero adjusted count.
+  * `RECORD_AND_SAMPLE` sampling decisions that cannot be defined in terms of
+    a built-in Sampler SHOULD use unknown adjusted count.
 
 #### GetDescription
 
 Returns the sampler name or short description with the configuration. This may
 be displayed on debug pages or in the logs. Example:
-`"TraceIdRatioBased{0.000100}"`.
+`"TraceIdRatioBased{.000122}"`.
 
 Description MUST NOT change over time and caller can cache the returned value.
 
@@ -245,48 +311,138 @@ The default sampler is `ParentBased(root=AlwaysOn)`.
 
 * Returns `RECORD_AND_SAMPLE` always.
 * Description MUST be `AlwaysOnSampler`.
+* Behavior of `ShouldSample` is identical to `TraceIdRatioBased{1}`
 
 #### AlwaysOff
 
 * Returns `DROP` always.
 * Description MUST be `AlwaysOffSampler`.
+* Behavior of `ShouldSample` is identical to `TraceIdRatioBased{0}`
 
 #### TraceIdRatioBased
 
-* The `TraceIdRatioBased` MUST ignore the parent `SampledFlag`. To respect the
+This Sampler supports making independent sampling decisions at each Span in
+a trace, making it possible for each SDK to independently control its rate
+of sample spans.  The decision is defined to be consistent, such that for
+any two spans `T` and `U` of the same trace configured with
+`TraceIDRatioBased` ratios `ratio_T` and `ratio_U`, respectively, if
+`ratio_T <= ratio_U` and span `T` was sampled, span `U` MUST also be
+sampled.
+
+The constructor for `TraceIDRatioBased` MUST accept ratios in the range 0
+through 1, inclusive.  When the input ratio is not a power of two exactly,
+the Sampler MUST round the ratio down to the nearest power of two.
+
+The `TraceIdRatioBased` Sampler MUST ignore the parent `SampledFlag`. To respect the
 parent `SampledFlag`, the `TraceIdRatioBased` should be used as a delegate of
 the `ParentBased` sampler specified below.
-* Description MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
-  with `RATIO` replaced with the Sampler instance's trace sampling ratio
-  represented as a decimal number. The precision of the number SHOULD follow
-  implementation language standards and SHOULD be high enough to identify when
-  Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
-  had a sampling ratio of 1 to every 10,000 spans it COULD return
-  `"TraceIdRatioBased{0.000100}"` as its description.
-
-TODO: Add details about how the `TraceIdRatioBased` is implemented as a function
-of the `TraceID`. [#1413](https://github.com/open-telemetry/opentelemetry-specification/issues/1413)
-
-##### Requirements for `TraceIdRatioBased` sampler algorithm
-
-* The sampling algorithm MUST be deterministic. A trace identified by a given
-  `TraceId` is sampled or not independent of language, time, etc. To achieve this,
-  implementations MUST use a deterministic hash of the `TraceId` when computing
-  the sampling decision. By ensuring this, running the sampler on any child `Span`
-  will produce the same decision.
-* A `TraceIdRatioBased` sampler with a given sampling rate MUST also sample all
-  traces that any `TraceIdRatioBased` sampler with a lower sampling rate would
-  sample. This is important when a backend system may want to run with a higher
-  sampling rate than the frontend system, this way all frontend traces will
-  still be sampled and extra traces will be sampled on the backend only.
-* **WARNING:** Since the exact algorithm is not specified yet (see TODO above),
-  there will probably be changes to it in any language SDK once it is, which
-  would break code that relies on the algorithm results.
-  Only the configuration and creation APIs can be considered stable.
-  It is recommended to use this sampler algorithm only for root spans
-  (in combination with [`ParentBased`](#parentbased)) because different language
-  SDKs or even different versions of the same language SDKs may produce inconsistent
-  results for the same input.
+
+The description MUST return a string of the form `"TraceIdRatioBased{RATIO}"`
+with `RATIO` replaced with the Sampler instance's trace sampling ratio
+represented as a decimal number. The precision of the number SHOULD follow
+implementation language standards and SHOULD be high enough to identify when
+Samplers have different ratios. For example, if a TraceIdRatioBased Sampler
+had a sampling ratio of 1-in-(2**13) spans it COULD return
+`"TraceIdRatioBased{0.000122}"` as its description.
+
+##### `TraceIdRatioBased` sampler behavior
+
+Given the restriction to power-of-two sampling probabilities stated above
+and a uniform random distribution of `TraceId` bits, it would be possible to
+define the `TraceIdRatioBased` Sampler behavior in terms of the number of
+leading zeros of the TraceID.  If this were the case, sampling with
+probability `2**(-k)` means sampling all traces with `TraceId` having at
+least `k` leading zeros.
+
+OpenTelemetry follows the W3C specification for Trace Context, which does
+not require a uniform random distribution of `TraceId` bits, therefore the
+built-in OpenTelemetry Samplers are defined to propagate additional bits of
+randomness to facilitate consistent sampling.
+
+To convey the head sampling probability, so that child `ParentBased`
+Samplers can accurately record their adjusted count, the `TraceIDRatioBased`
+Sampler is required to propagate its adjusted count via W3C `tracestate`
+when that form of trace context propagation is used.  OpenTelemetry
+specifies the following syntax for propagating two values known as `p` for
+probability and `r` for randomness, for example:
+
+```
+tracestate: otel=p:PP;r:RR
+```
+
+where `PP` are two base16 encoded bytes conveying 6 bits of probability and
+`RR` are two base16 encoded bytes conveying 6 bits of randomness.  [TODO:
+elsewhere and in a different PR, write the syntax and rules for interpreting
+OpenTelemetry `tracestate`.  In both cases, values for `p` and `r` are
+restricted to the range 0 through 63 inclusive.](TODO)
+
+The value for `p` corresponds exactly with the encoded value for
+`log_head_adjusted_count` of the parent span, with values 1 through 62
+representing known non-zero adjusted counts.
+
+The value for `r` MUST be generated at the root of the trace, which MUST use
+a `TraceIdRatioBased` sampler so that its child `TraceIdRatioBased` Samplers
+know their adjusted counts.
+
+The `ShouldSample()` implementation considers two cases, root and non-root
+decisions.
+
+###### TraceIdRatioBased: Root case
+
+The value for `r` MUST be drawn from the following discrete probability
+distribution:
+
+| `r` Value | Likihood  |
+| --------- | --------- |
+| 0         | 1/2       |
+| 1         | 1/4       |
+| 2         | 1/8       |
+| 3         | 1/16      |
+| ...       | ...       |
+| N         | 2**(-N-1) |
+| ...       | ...       |
+| 62        | 2**(-63)  |
+| 63        | Reject    |
+
+This value can be computed by counting the number of zeros in a string of
+random bits limited to maximum value 62.  Here is example code to compute a
+value for `r`:
+
+```golang
+func nextRandomness() int {
+  // Repeat until a valid result is produced.
+  for {
+    R := 0
+    for {
+      if nextRandomBit() {
+        break
+      }
+      R++
+    }
+	if R < 63 {
+	  return R
+    }
+	// Reject, try again.
+  }
+}
+```
+
+Given a randomness value `r` and a power-of-two `TraceIdRatioBased` sampling
+ratio expressed as`2**(-s)`, the `TraceIdRatioBased` Sampler MUST make the
+sampling decision `RECORD_AND_SAMPLE` if and only if `s <= r`.
+
+###### TraceIdRatioBased: Non-root case
+
+The `r` value should have been generated by the root `Sampler`.  If no `r`
+value was provided in the context, the Sampler SHOULD return an unknown
+adjusted count as the `log_head_adjusted_count`.  No `p` value should be
+propagated to child contexts, in this case.
+
+When `r` was properly generated by a root Sampler, the decision is the same
+for the root case.  Given a randomness value `r` and a power-of-two
+`TraceIdRatioBased` sampling ratio expressed as`2**(-s)`, the
+`TraceIdRatioBased` Sampler MUST make the sampling decision
+`RECORD_AND_SAMPLE` if and only if `s <= r`.
 
 #### ParentBased
 
@@ -317,6 +473,12 @@ Optional parameters:
 |present|false|true|`localParentSampled()`|
 |present|false|false|`localParentNotSampled()`|
 
+The `ParentBased` Sampler computes the `log_head_adjusted_count` field as
+follows:
+
+* If a `p` value was generated by the parent Sampler or one of its ancestors, use it as the `log_head_adjusted_count`
+* If no `p` value was generated by the parent Sampler or one of its ancestors, return an unknown `log_head_adjusted_count`.
+
 ## Span Limits
 
 Erroneous code can add unintended attributes, events, and links to a span. If