From df6b1d04d9dec166507207b1b4d055bf50782974 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <josh.macdonald@gmail.com>
Date: Wed, 10 May 2023 17:05:22 -0700
Subject: [PATCH] New draft

---
 text/trace/0226-sampling-random-traceids.md | 462 +++++++-------------
 1 file changed, 157 insertions(+), 305 deletions(-)

diff --git a/text/trace/0226-sampling-random-traceids.md b/text/trace/0226-sampling-random-traceids.md
index ba47d3833..96c538752 100644
--- a/text/trace/0226-sampling-random-traceids.md
+++ b/text/trace/0226-sampling-random-traceids.md
@@ -2,352 +2,204 @@
 
 ## Motivation
 
-**Status*: CURRENT
-
 The existing, experimental [specification for probability sampling using TraceState](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-probability-sampling.md)
 supporting Span-to-Metrics pipelines is limited to powers-of-two
-probabilities and is designed to work without making assumptions about
+probabilities and is designed to work without making assumptions about 
 TraceID randomness.
 
-This proposes to extend that specification with support for 56-bit
-precision sampling probability.  This is seen as particularly
-important for implementation of probabilistic tail samplers (e.g., in
-the OpenTelemetry Collector) as explained below.
+Head sampling requires the use of TraceState to propagate context from
+the parent for recording in child spans, in support of Span-to-Metrics
+pipelines.  Tail sampling does not require context propagation
+support, but it has many similar requirements:
+
+1. Sampling should be "consistent", so that independent collection
+   paths make identical sampling decisions.
+2. Spans should be countable in a Span-to-Metrics pipeline, which
+   requires knowing the "adjusted count" for each span directly from
+   the data.
 
 This OTEP makes use of the [draft-standard W3C tracecontext `random`
 flag](https://w3c.github.io/trace-context/#random-trace-id-flag),
 which is an indicator that 7 bytes of true randomness are available
 for probability sampler decisions.
 
-## Explanation
+This proposes to create a specification with support for 56-bit
+precision tail sampling.  This is seen as particularly important for
+implementation of probabilistic tail samplers (e.g., in the
+OpenTelemetry Collector) as explained below.
 
-**Status*: CURRENT
+## Explanation
 
 The existing, experimental TraceState probability sampling
 specification relies on two variables known as **r-value** and
 **p-value**.  The r-value carries the source of randomness and the
-p-value carries the effective sampling probability.
-
-Given this specification, a ConsistentProbabilitySampler can be
-applied as a head sampler for non-power-of-two sampling probabilities
-using interpolation.  For example, an effective sampling probability
-of 1-in-3 can be achieved by alternating between 25% and 50% sampling.
-However, interpolation only works for trace roots, otherwise
-"consistent" sampling can only be achieved at the next smaller power
-of two.  In the example, sampling at 1-in-3 using interpolation means
-traces are only guaranteed **consistent** at 25% and smaller sampling
-probabilities.
-
-The major downside of the r-value, p-value approach is that r-value
-must be encoded even for unsampled contexts.  Ideally, building
-Span-to-Metrics pipelines should be low overhead which means not
-adding additional data to unsampled contexts.
-
-This proposal avoids r-value by using 7 bytes of intrinsic randomness
-in the TraceID, the ones (draft-) specified [in the W3C tracecontext
-`random` flag](https://w3c.github.io/trace-context/#random-trace-id-flag).
-Since this Sampler is expected to behave consistently with or without
-the `random` flag, we assume the bits are random and do not actually
-check the W3C random flag.
-
-This document propose extending the existing p-value, r-value
-mechanism with support for a new indicator for non-power-of-two
-probability sampling known as "t-value", where "t" is chosen because
-it signifies a threshold.  If widely adopted, the tracestate r-value
-can be deprecated, as it is not needed when randomness is provided in
-the TraceID.
-
-As proposed, t-value and p-value are mutually exclusive; p-value
-remains the preferred encoding for probability sampling when a
-power-of-two sampling probability is used.  P-value also remains the
-specified way to encode zero adjusted count (i.e., p=63).  T-value MAY
-be used to encode power-of-two probabilities, although typically the
-equivalent p-value uses fewer bytes.
-
-### T-Value encoding: Requirements
-
-**Status*: NEW-DRAFT
-
-#### Exactness
-
-This proposal is required to be support precision Span-to-Metrics
-pipelines.  This means that effective sampling probabilities are
-limited to discrete values that can be exactly represented. The number
-of discrete steps between powers of two is limited by the number of
-remaining bits of randomness in the TraceID.
-
-To achieve exactly 1-in-2^56 sampling, a sampler can select all traces
-with 56 `0`s of TraceID randomness.  It is not possible to achieve a 
-smaller sampling probability than 1-in-2^56.
-
-The next larger, exactly representable sampling probability is
-1-in-2^55.  At this probability, a sampler can select all traces with
-55 leading `0`s of TraceID randomness (i.e., 55 `0`s followed by a `1`
-and 55 `0`s followed by a `0`).  There are no exact probabilities
-representable between 1-in-2^55 and 1-in-2^56.
-
-The next larger, exactly representable power-of-two sampling
-probability is 1-in-2^54.  At this probability, a sampler can select
-all traces with 54 leading `0`s of TraceID randomness.  At 4 out of
-2^56, this sampling probability includes the two TraceID-randomess
-values selected at smaller powers-of-two (i.e., 1-in-2^55 and
-1-in-2^56) plus two new TraceID-randomness values.  One of the two new
-TraceID-randomness values corresponds with exactly 1-in-2^54 sampling,
-the other of these is the smallest exactly-representable
-non-power-of-two sampling probability according to this scheme.  It
-lies halfway between 1-in-2^54 and 1-in-2^55; in binary floating point
-representation, this value is displayed as `0x1.8p-55`.
-
-Continuing this pattern, the next larger power-of-two sampling
-probability is 1-in-2^53, which is 8 out of 2^56, 4 of which were
-covered above and 4 of which are new.  Of the four new, 1 is the exact
-power-of-two and there are three available non-power-of-two
-probabilities in this range.  These probabilities are (exactly)
-`0x1.Cp-54`, `0x1.8p-54`, and `0x1.4p-54`.
-
-In the pattern developed here, the number of sampling probabilities in
-the open interval `(2^-N, 2^-(N+1))` equals `(2^(56-N))-1`.
-
-Note we are disregarding the fact that a TraceID with all zeros (i.e.,
-128 `0` bits) is specified invalid by OpenTelemetry, which makes the
-all-zeros TraceID-randomness value slightly less probable than other
-values.
-
-#### Correspondence with R-value
-
-**Status*: NEW-DRAFT
-
-There are reasons to maintain compatibility with r-values in the range
-[0, 56] as developed in the earlier specification, particularly
-because it enables intentionally-consistent sampling across multiple
-traces.  We require that when r-value is used, r-value takes
-precendece over builtin TraceID-randomness.
-
-In this specification, the use of r-values greater than 56 is deprecated.
-
-We require the correspondence with non-power-of-two sampling
-probabilities exact to be exact.  This can be achieved as follows by
-calculating an *effective TraceID-randomness value* from the r-value
-combined with the original randomness.
-
-When r-value is set to the value `x` (where `x < 56`), the effective
-TraceID-randomness value used is calculated as `x` leading `0`s,
-followed by a `1`, followed by the original `56-x-1` trailing bits of
-TraceID-randomness.
-
-R-value propgation rules are unmodified.  R-value consistency-checking
-rules will be updated to detect inconsistent t-values, similar to the
-current specification's rules for detecting inconsistent p-values..
-
-#### Sampling decision logic
-
-**Status*: NEW-DRAFT
-
-An implementation of a head or tail sampler is expected to perform a
-simple comparison between the 56 bits of TraceID-randomness value and
-a threshold value.  The encoded t-value will correspond with one of
-the exactly representable values of TraceID-randomness, such that a
-simple less-than-or-equal comparison achieves exactly the correct
+p-value carries the effective sampling probability.  The preceding
+specification recommends the use of interpolation to achieve
+non-power-of-two sampling probabilities.
+
+This specification is proposed that aims to offer an alternative to
+that r-value, p-value specification, one that is simpler to implement,
+can be used in both head- and tail-samplers, and that naturally
+supports non-power-of-two sampling probabilities.
+
+This proposal uses the 7 bytes of intrinsic randomness in the TraceID,
+the ones (draft-) specified [in the W3C tracecontext `random`
+flag](https://w3c.github.io/trace-context/#random-trace-id-flag). With
+these bits, a simple threshold test is defined to allow sampling based
+on TraceID randomness.
+
+This document proposes extending the p-value, r-value mechanism with
+support for a new indicator for non-power-of-two probability sampling
+known as "t-value", where "t" is chosen because it signifies a
+threshold.  Tail-based sampling encoded by t-value can be combined
+with p-value, in which case the adjusted count implied by t-value is
+**multiplied** with the adjusted count implied by p-value because they
+are independent mechanisms.
+
+### Detailed design
+
+Support for Span-to-Metrics pipelines requires knowing the "adjusted
+count" of every collected span.  This proposal defines the sampling
+"threshold" as a 7-byte string used to make consistent sampling
+decisions, as follows.
+
+1. Bytes 9-16 of the TraceID are interpreted as a 7-byte unsigned
+   value in big-endian byte order.
+2. If the unsigned value determined by the trace is less-than
+   to the sampling threshold, the span is sampled, otherwise it is
+   discarded.
+   
+To calculate the Sampling threshold, we begin with an IEEE-754
+standard double-precision floating point number.  With 52-bits of
+significand and a floating exponent, the probability value used to
+calculate a threshold may be capable of representing more-or-less
+precision than the sampler can execute.
+
+We have many ways of encoding a floating point number as a string,
+some of which result in loss of precision.  This specification dicates
+exactly how to calculate a sampling threshold from a floating point
+number, and it is the sampling threshold that determines exactly the
+effective sampling probability.  The conversion between sampling
+probability and threshold is not exactly reversible, so to determine
+the sampling probability exactly from an encoded t-value, first
+compute the exact sampling threshold, then use the threshold to derive
+the exact sampling probability.
+
+From the exact sampling probability, we are able to compute (subject
+to machine precision) the adjusted count of each span.  For example,
+given a sampling probability encoded as "0.1", we first compute the
+nearest base-2 floating point, which is exactly 0x1.999999999999ap-04,
+which is approximately 0.10000000000000000555.  The exact quantity in
+this example, 0x1.999999999999ap-04, is multipled by `2^56` and
+rounded to an unsigned integer (7205759403792794).  This specification
+says that to carry out sampling probability "0.1", we should keep
+exactly 7205759403792794 smallest unsigned values of the 56-bit random
+TraceID bits.
+
+## T-value encoding for adjusted counts
+
+The example used sampling probability "0.1", which is a concisely
+rounded value but not exactly a power of two.  The use of decimal
+floating point in this case conceals the fact that there is an integer
+reciprocal, and when there is an integer reciprocal there are good
+reasons to preserve it.  Rather than encoding "0.1", it is appealing
+to encode the adjusted count (i.e., "10") because it conveys exactly
+the user's intention.
+
+This suggests that the t-value encoding be designed to accept either
+the sampling probability or the adjusted count, depending on how the
+sampling probability was derived.  Thus, the proposed t-value shall be
+parsed as a floating point or integer number using any POSIX-supported
+printf format specifier.  Values in the range [0x1p-56, 0x1p+56] are
+valid.  Values in the range [0x1p-56, 1] are interpreted as a sampling
+probability, while values in the range [1, 0x1p+56] are intepreted as
+an adjusted count.  Adjusted count values must be integers, while
+sampling probability values can be arbitrary floating point values.
+
+Whether to encode sampling probabilty or adjusted count is a choice.
+In both cases, the interpreted value translates into an exact
+threshold, which determines the exact inclusion probability.  From the
+exact inclusion probability, we can determine the adjusted count to
+use in a span-to-metrics pipeline.  When the t-value is _stated_ as an
+adjusted count (as opposed to a sampling probabilty), implementations
+can use the integer value in a span-to-metrics pipeline.  Otherwise,
+implementations should use an adjusted count of 1 divided by the
 sampling probability.
 
-#### Consistency between head and tail sampling
-
-**Status*: NEW-DRAFT
-
-The correspondence with r-value is meant to ensure that head samplers
-and tail samplers will make a consistent decision at non-power-of-two
-sampling probabilities.  Whereas the existing specification states
-that head samplers should use random interpolation between
-powers-of-two, the updated consistent sampling specification will use
-the deterministic algorithm for head and tail developed above.
-
-#### Deterministic mapping to integer adjusted counts
-
-**Status*: NEW-DRAFT
-
-One requirement remains to be developed.  A nice-to-have feature
-developed in the earlier specification is that when interpolating
-between power-of-two sampling probabilities, the final p-value would
-nevertheless be output with one of the nearby power-of-two adjusted
-counts.
-
-Using the smallest representable non-power-of-two sampling probability
-`0x1.8p-55` as an example--this value lies exactly half-way between
-two powers-of-two so we require a deterministic, unbiased way to
-select `0x1p-54` 1-out-of-3 times and `0x1p-55` 2-out-of-3 times.
-
-Can we use the SpanID bits to make this selection consistently at the
-consumer for each Span?  This would allow an exactly-encoded
-non-power-of-two `t-value` to nevertheless be mapped into integer
-(power-of-two) adjusted counts.
-
-TODO: This is an ongoing investigation.
-
-#### Summary of sampling algorithm
-
-**Status*: NEW-DRAFT
-
-The steps to perform a sampling decision are the same for both head
-and tail samplers.
-
-First, select an exactly representable sampling probability.  If the
-input is an arbitrary floating point value, it will have to be rounded
-to a nearby exact probablity.  Then, the probability is converted in
-two ways: 
-
-1. The t-value is calculated that encodes the exact effective samping
-   probability.
-2. The 56-bit threshold for comparing against TraceID-randomness is
-   calculated as described above.
-
-For each span, the sampler extracts 56 bits of presumed randomness
-from the TraceID, the so-called TraceID-randomness value.
-
-When r-value is set to `x` in the span's context, the sampler modifies
-the leading `x+1` bits of TraceID-randomness value with `x` `0`s and
-followed by a `1`.
-
-A simple comparison is made between the threshold and the effective
-TraceID-randomness value.  If the effective TraceID-randomess value is
-less than or equal to the threshold, the span is selected with the
-calculated t-value.  Otherwise, the span is not selected.
-
-### T-Value encoding: Original draft
-
-**Status*: OUT-OF-DATE
-
-Since we have 7 bytes, or 56 bits of randomness available, there are
-2^56 non-zero sampling probabilities that can be encoded.  These
-probabilities can be expressed as a 56-bit number in the range [0,
-0xffffffffffffff], where 0 corresponds with sampling 1 span out of
-2^56 and 0xffffffffffffff corresponds with sampling 100% of spans.
-
-The proposal is summarized as follows.  T-value is encoded as a
-hexadecimal string containing between 1 and 14 hex digits.  When the
-T-value is less than 14 hex digits, it is extended to 14 bytes using
-by padding with 0s.  For example, the t-value string "003f"
-corresponds with a the 14-hex-digit string "003f0000000000".
-
-Head samplers and tail samplers alike can be implemented simply by
-tersting whether the least-significant 7 bytes of the TraceID are
-lexicgraphically less-than-or-equal to the sampling threshold.  Note
-that this comparison may be carried out directly on hex digits or on
-binary data using simple string or bytes comparison.
+## Where to store t-value in a Span and/or Log Record
+
+Although prepared as a solution for tail sampling, the t-value
+encoding scheme could also be used to convey Logs sampling.  While
+tail sampling does not require the use of trace state, which is
+associated with context propagation, it makes a natural place to store
+t-value because it should be interpreted along with p-value, which
+resides in the trace state.  However, if spans store t-value in trace
+state, it is not clear how to convey logs sampling.
+
+Here are ways to address this:
+
+1. Store t-value in a new dedicated field in the Span or Log Record
+   (as a string).  (Author's preference.)
+2. Store t-value as a Span or Log Record attribute (as a string).
+   This may cause confusion because the attribute, which was not
+   applied by a user, can change long the collection path even though
+   the data has not changed.
+3. Store t-value as an optional floating point field in the Span or
+   Log Record.  An optional field is required because we need a
+   meaningful way to represent zero probability, for cases where spans
+   are exporter due to a non-probabilistic decision.
+4. Create a new field in both Spans and Log Records as a dedicated
+   field for storing t-values.
+   
+The benefit of using TraceState is that it is an extensible field,
+made for multiple vendors to place arbitrary contents.  It is not
+clear whether use of tracestate to record collection-time decisions is
+appropriate, or whether it is only meant for in-band context
+propagation.  If this use-case is acceptable, the name Trace State
+would become a legacy; in this case, a more signal-neutral name for
+the field could be developed (e.g., "Collection State")
 
-Modifying an in-SDK Sampler to perform this calculation is a simple
-change relative to setting p-value for sampled spans.  For tail
-samplers, a span processor can simply pass through all spans where the
-least-significant 7-bytes of TraceID are less-than-or-equal to the
-configured threshold.  When the span passes, it has its TraceState
-t-value set to the configured threshold for use in Span-to-Metrics
-pipelines.
-
-### Converting between Thresholds and Probabilities
-
-**Status*: OUT-OF-DATE
-
-Sampling probabilities in the range (0, 1] can be mapped onto 56-bit
-encoded t-values in the range [0, 0xffffffffffffff].  For a given
-sampling threshold, the corresponding probability is expressed as a
-fraction `(T+1)/2^56` (i.e., sampling threshold plus one divided by
-2^56).
-
-Note that IEEE double-width floating point numbers use 52 bits of
-significand, so not all sampling thresholds have corresponding
-floating point values that the user might be able to express.
-
-For SDKs and Span processors to implement consistent probability
-sampling, OpenTelemetry should define how to compute a sampling
-threshold from a floating point number and in the reverse direction,
-how to compute a floating point number from a threshold.  Combined,
-these rules allow simple sampling logic to be easily translated into
-probabilities or adjusted counts for use in a Span-to-Metrics
-pipeline.
-
-#### Probability to Hex Threshold
-
-**Status*: OUT-OF-DATE
-
-Note that the procedure here only works for probabilities greater than
-or equal to 2^-52.
-
-To convert from a floating point number to the nearest threshold as a
-14-byte hex string:
-
-```
-func ProbabilityToThreshold(prob float64) string {
-	return fmt.Sprintf("%.14x", math.Nextafter(prob+1, 0))[4:18]
-}
-```
-
-Note that this can be truncated after one or more non-zero digits,
-leaving a more-compact encoding of a sampling probability that is
-nearby.
-
-Note that the threshold is rounded down, it will be slightly smaller
-than the configured probabilty in cases where the probability cannot
-be exactly represented in 56 bits.
-
-#### Hex Threshold to Probability
-
-**Status*: OUT-OF-DATE
+### 90% sampling 
 
-To convert a hex threshold string to the corresponding probability, we
-perform that opposite of the above.
+The following header
 
 ```
-func ThresholdToProbability(thresh string) float64 {
-    parsed, _ := strconv.ParseFloat("0x1."+thresh[:13]+"p+00", 64)
-	return math.Nextafter(parsed, 2) - 1
-}
+tracestate: ot=t:0.9
 ```
 
-Note that these transformations are not always reversible, since
-floating point numbers have less precision.  Note that only 13 bytes
-of the hex string are used to form the floating point value, since
-that is all the precision a double-wide floating point number has.
-
-## Examples
-
-**Status*: OUT-OF-DATE
-
-### 90% sampling 
+### 1-in-3 sampling
 
 The following header
 
 ```
-tracestate: ot=t:e66
+tracestate: ot=t:3
 ```
 
-contains a sampling threshold "e66", which is extended to
-"e6600000000000".  The corresponding TraceID's least-significant 7
-bytes are expected to be less than or equal to "e6600000000000".
-
-The corresponding sampling probability, calculated using the equation
-above, is 0.9.  The adjusted count of this span in a Span-to-Metrics
-pipeline is 1.11.
+corresponds with 1-in-3 sampling.
 
-### 0.33333% sampling
+### 25% head sampling, 1-in-10 tail sampling
 
 The following header
 
 ```
-tracestate: ot=t:00da7
+tracestate: ot=p:2;t:10
 ```
 
-corresponds with 0.33333% sampling.
+corresponds with 1-in-4 sampling at the head and 1-in-10 tail
+sampling.  The resulting span has adjusted count 40.
 
 ## Trade-offs and mitigations
 
-**Status*: OUT-OF-DATE
+Support for encoding t-value as either a probability or an adjusted
+count is meant to give the user control over loss of precision.  At
+the same time, it can be read by humans.
 
-Note that the t-value encoding is not efficient for encoding
-power-of-two probabilities (e.g., "ffffffffffffff" corresponds with
-100% sampling).  That is why the use of p-value is recommended when
-the configured sampling probability is an exact power-of-two.
+Floating point numbers can be encoded exactly to avoid ambiguity, for
+example, using hexadecimal floating point representation.  Likewise,
+adjusted counts can be encoded exactly as integers to convey the
+user's intended sampling probability without floating point conversion
+loss.
 
 ## Prior art and alternatives
 
-**Status*: OUT-OF-DATE
-
 An earlier draft of proposal was explored [here](https://github.com/jmacd/opentelemetry-collector-contrib/pull/2925).