open-telemetry · c24t · Oct 22, 2019 · Oct 22, 2019 · Oct 22, 2019 · Oct 22, 2019
diff --git a/specification/sdk-tracing.md b/specification/sdk-tracing.md
@@ -122,8 +122,93 @@ These are the default samplers implemented in the OpenTelemetry SDK:
 
 #### Probability Sampler algorithm
 
-TODO: Add details about how the probability sampler is implemented as a function
-of the `TraceID`.
+The `ProbabilitySampler` makes a determistic sampling decision for each span
+based on the span's trace ID. It samples a fixed percentage of all spans
+following a user-supplied sampling rate.
+
+According to the [W3C Trace Context
+spec](https://www.w3.org/TR/trace-context/#trace-id), vendor-supplied trace IDs
+may include both random and non-random components. To avoid sampling based on
+the non-random component, the sampler should consider only the leftmost portion
+of the trace ID. Implementations MAY allow the user to configure the number of
+bits of the trace ID that the sampler considers. We define this number as
+the sampler's *precision*.
+
+A trace ID is a 16-byte array. We define the *truncated trace ID* to be the
+leftmost precision-many bits of the trace ID:
+
+```
+truncated_trace_id = traceID[0:ceil(precision / 8))]
+```
+
+Where:
+
+- `precision` is the number of bits of the trace ID to consider, in `[1, 128]`
+- `ceil(float f)` returns the smallest integer greater than `f`
+- `a[l:h]` is the slice operator: it returns the subsequence of `a` from index
+  `l` to `h` inclusive
+
+This is equivalent to converting the trace ID to an unsigned integer assuming
+big-endian byte order, and shifting to remove the unused bits:
+
+```
+truncated_trace_id = (int) traceID >> (128 - precision)
+```
+
+The `ProbabilitySampler` should only sample traces with truncated trace IDs
+less than a certain value. We define this value as the sampler's *bound*:
+
+```
+bound = round(rate * pow(2, precision))
+```
+
+Where:
+
+- `round(float f)` rounds `f` to the nearest integer
+- `rate` is the sampling rate, in `[0.0, 1.0]`
+- `pow(a, b)` is the exponentiation operator: `a` to the power of `b`
+
+Note that this value doesn't depend on the trace to be sampled. Implementations
+should generally compute this once, not on every sampling decision.
+
+The sampling decision for a trace is given by:
+
+```
+to_sample = truncated_trace_id < bound
+```
+
+Note that the effective sampling rate is the number closest to `rate` that can
+be expressed as a multiple of `2^-precision`. As a consequence, it's not
+possible to set arbitrarily low sampling rates, even on platforms that support
+arbitrary-precision arithmetic.
+
+A `ProbabilitySampler` with rate `0.0` MUST NOT choose to sample any traces,
+even if every bit of the truncated trace ID is `0`. Similarly, a
+`ProbabilitySampler` with rate `1.0` MUST choose to sample all traces, even if
+every bit of the the truncated trace ID is `1`.
+
+**Example:**
+
+Consider a `ProbabilitySampler` with rate `.25` and 16 bit precision.
+
+First, find the lowest truncated trace ID that will not be sampled. This number
+represents the 25th percentile of the range of possible values:
+
+```
+.25 * 2^16 = 0x4000
+```
+
+Make a sampling decision for a given trace by comparing this number to the
+leftmost 16 bits of its 128 bit trace ID:
+
+```
+trunc_tid = (int) traceID >> (128 - 16)
+to_sample = trunc_tid < 0x4000
+```
+
+This sampler should sample traces with truncated trace ID in the range
+`[0x0000, 0x4000)`. Assuming `x` is uniformly distributed over `[0x0000,
+0xffff]`, this should account for 25% of all traces.
 
 ## Tracer Creation