From 79c9dfeee28b12d68dd5e2d2ec58a68f9523675e Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 5 Mar 2021 23:35:32 -0800
Subject: [PATCH 01/68] Sampling basics

---
 text/0148-sampling-probability.md | 181 ++++++++++++++++++++++++++++++
 1 file changed, 181 insertions(+)
 create mode 100644 text/0148-sampling-probability.md

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
new file mode 100644
index 000000000..ef64d5889
--- /dev/null
+++ b/text/0148-sampling-probability.md
@@ -0,0 +1,181 @@
+# Probability sampling of telemetry events
+
+Specify a foundation for sampling techniques in OpenTelemetry.
+
+## Motivation
+
+In both tracing and metrics, there are widely known techniques for
+sampling events that, when performed correctly, enable ways to lower
+collection costs.  While sampling techniques vary, it is possible
+specify high-level interoperability requirements that producers and
+consumers of sampled data may follow to enable a wide range of
+sampling designs.
+
+## Explanation
+
+Consider a hypothetical telemetry signal in which an API event
+produces a unit of data that has one or more associated numbers.
+Using the OpenTelemetry Metrics data model terminolgy, we have two
+scenarios in which sampling is common.
+
+1. Counter events: Each event represents a count, signifying the change in a sum
+2. Histogram events: Each event represents an individual variable, signifying new membership in a distribution
+
+A Tracing Span event qualifies as both of these cases simultaneously.  It is
+a Counter event (of 1 span) and a Histogram event (of 1 latency measurement).
+
+In Metrics, [Statsd Counter and Histogram events meet this definition](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#sampling).
+
+In both cases, the goal in sampling is to estimate something about the
+population of all events, using only the events that were chosen in
+the sample.  Sampling theory defines various kinds of "estimator",
+algorithms for calculating statistics about the population using just
+the sample data.  For the broad class of telemetry sampling
+application considered here, we need an estimator for the population
+total represented by each individual event.
+
+### Sample count adjustment
+
+The estimated population total divided by the individual event value
+equals the event's _adjusted count_.  The adjusted count is named
+`sample_count` where it appears as a field in a telemetry data type.
+The field indicates that the individual sample event is estimated to
+represent `sample_count` number of identical events in the population.
+
+A standard sampling adjustment will be defined, and for it to work
+there is one essential requirement.  The selection procedure must be
+_unbiased_, a statistical term meaning that the process is expected to
+give equal consideration to all possible outcomes.
+
+The specified sampling adjustment sets the `sample_count` of each
+sampled event to the inverse of the event's inclusion probability.
+Conveying the inverse of the inclusion probability is convenient for
+several reasons:
+
+- A `sample_count` of 1 indicates no sampling was performed
+- A `sample_count` of 0 indicates an unrepresentative event
+- Larger `sample_count` indicates greater representivity
+- Smaller `sample_count` indicates smaller representivity
+- The sum of `sample_count` in a sample equals the expected 
+  value of the population size.
+  
+The zero `sample_count` value supports collecting values that were
+rejected from a sample to use as exemplars, which supports encoding of
+values that should not be counted in an estimate of the population
+total.  This has applications to reservoir sampling designs, where
+events may be selected for a sample only to be rejected before the end
+of the frame.
+
+### Sampling with attributes
+
+Sampling is a powerful approach when used with event data that has
+been annotated with key-value attributes.  It is possible to select
+arbitrary subsets of the sampled data and use it to estimate the count
+of arbitrary subsets of the population.
+
+To summarize, there is a widely applicable procedure for sampling
+telemetry data from a population:
+
+- use an unbiased sampling algorithm to select telemetry events
+- label each event in the sample with its `sample_count` (i.e., inverse inclusion probability)
+- apply a predicate to events in the sample to select a subset
+- apply an estimator to the subset to estimate the sub-population total.
+
+Applied correctly, this approach provides accurate estimates for
+population counts and distributions with support for ad-hoc queries
+over the data.  This technique has been applied to control the cost of
+collecting high-cardinality telemetry data in both Tracing and
+Metrics.
+
+### Changes proposed
+
+This proposal leads to three change requests that will be carried out in
+separate places in the OpenTelemetry specification.  These are:
+
+1. For tracing, the SpanData message type should be extended with 
+   the `sample_count` field defined above.
+2. For metrics aggregate data: Count information aggregated from
+   sample metric events will have floating point values in general.
+   Histogram and Counter data must be able to support floating point
+   values.
+3. For metrics raw events: Exemplars should be extended with the 
+   `sample_count` field defined above.
+
+### Example: Dapper tracing
+
+Google's [Dapper](https://research.google/pubs/pub36356/) tracing
+system describes the use of sampling to control the cost of trace
+collection at scale.
+
+### Example: Statsd metrics
+
+A Statsd counter event appears as a line of text, for example a metric
+named `name` is incremented by `increment` using a counter event (`c`)
+with the given `sample_rate`.
+
+```
+name:increment|c|@sample_rate
+```
+
+For example, a count of 100 that was selected for a 1-in-10 simple
+random sampling scheme will arrive as:
+
+```
+counter:100|c|@0.1
+```
+
+Probability 0.1 leads to a `sample_count` of 10.  Assuming the sample
+was selected using an unbiased algorithm (as by a "fair coin"), we can
+interpret this event as probabilistically equal to a count of `100/0.1
+= 1000`.
+
+## Internal details
+
+The statistical foundation of this technique is known as the
+Horvitz-Thompson estimator ("A Generalization of Sampling Without
+Replacement From a Finite Universe", JSTOR 1952).  The
+Horvitz-Thompson technique works with _unequal probability sampling_
+designs, enabling a variety of techniques for controlling properties
+of the sample.  
+
+For example, you can sample 100% of error events while sampling 1% of
+non-error events, and the interpretation of `sample_count` will be
+correct.
+
+### Bias, variance, and sampling errors
+
+There is a fundamental tradeoff between bias and variance in
+ statistics.  The use of unbiased sampling leads unavoidably to
+increased variance.
+
+Estimating sampling errors and variance is out of scope for this
+proposal.  We are satisfied the unbiased property, which guarantees
+that the expected value of totals derived from the sample equals the
+true population totals.  This means that statistics derived from
+sample data in this way are always accurate, and that more data will
+always improve precision.
+
+### Non-probabilistic rate-limiters
+
+Rate-limiting stages in a telemetry collection pipeline interfere with
+sampling schemes when they operate in non-probabilistic ways.  When
+implementing a non-probabilistic form of rate-limiting, processors
+MUST set the `sample_count` to a NaN value.
+
+### No Sampler configured
+
+When no Sampler is in place and all telemetry events pass to the
+output, the `sample_count` field SHOULD be set to 1 to indicate
+perfect representivity, indicating that no sampling was performed.
+
+## Prior art and alternatives
+
+The name `sample_count` is proposed because the resulting value is
+effectively a count and may be used in place of the exact count.
+
+Statsd conveys inclusion probability instead of `sample_count`, where
+it is often called "sample rate".
+
+Another name for the proposed `sample_count` field is
+`inverse_probability`, which is considered less suggestive of the
+field's purpose.

From 22abb007c1c37d0a715b7cbac342f825838cf0ef Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 5 Mar 2021 23:40:51 -0800
Subject: [PATCH 02/68] More prior art

---
 text/0148-sampling-probability.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index ef64d5889..5f45587af 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -179,3 +179,6 @@ it is often called "sample rate".
 Another name for the proposed `sample_count` field is
 `inverse_probability`, which is considered less suggestive of the
 field's purpose.
+
+"Subset sum estimation" is the name given to this topic within the
+study of computer science and engineering.

From 02e97fcbb2fbe80ad17bc3ff9467934989fff83a Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 5 Mar 2021 23:57:43 -0800
Subject: [PATCH 03/68] Applicability

---
 text/0148-sampling-probability.md | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 5f45587af..db4fbb538 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -83,9 +83,7 @@ telemetry data from a population:
 
 Applied correctly, this approach provides accurate estimates for
 population counts and distributions with support for ad-hoc queries
-over the data.  This technique has been applied to control the cost of
-collecting high-cardinality telemetry data in both Tracing and
-Metrics.
+over the data.
 
 ### Changes proposed
 
@@ -168,6 +166,29 @@ When no Sampler is in place and all telemetry events pass to the
 output, the `sample_count` field SHOULD be set to 1 to indicate
 perfect representivity, indicating that no sampling was performed.
 
+### Applicability for tracing
+
+When using sampling to limit span collection, there are usually
+approaches under consideration.  The sampling approach covered here
+dictates how to select root spans in a probabilistic way.  When
+recording root spans, the `sample_count` field should be set as
+described above.  The adjusted `sample_count` of the root span applies
+the trace, meaning the trace should be considered as representative of
+`sample_count` traces in the population.
+
+When non-root spans are recorded because they are part of an ongoing
+trace, they are considered non-probabilistic exemplars.  Non-root
+spans should have `sample_count` set to zero.
+
+### Applicability for metrics
+
+The use of sampling in metrics makes it possible to record
+high-cardinality metric events efficiently, as demonstrated by Statsd.
+
+By pushing sampled metric events from client to server, instead of
+timeseries, it is possible to defer decisions about cardinality
+reduction to the server, without unreasonable cost to the client.
+
 ## Prior art and alternatives
 
 The name `sample_count` is proposed because the resulting value is

From 7301d5fed27d3e7555de6e0d57d9318bf85e769e Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 5 Mar 2021 23:58:49 -0800
Subject: [PATCH 04/68] Edits

---
 text/0148-sampling-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index db4fbb538..a5c61269b 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -57,7 +57,7 @@ several reasons:
 - Larger `sample_count` indicates greater representivity
 - Smaller `sample_count` indicates smaller representivity
 - The sum of `sample_count` in a sample equals the expected 
-  value of the population size.
+  population size.
   
 The zero `sample_count` value supports collecting values that were
 rejected from a sample to use as exemplars, which supports encoding of

From 3600e774c0be606e6233c3f38d15bee22dc9522a Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 6 Mar 2021 00:45:29 -0800
Subject: [PATCH 05/68] Typos

---
 text/0148-sampling-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index a5c61269b..5bbf0e874 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -143,11 +143,11 @@ correct.
 ### Bias, variance, and sampling errors
 
 There is a fundamental tradeoff between bias and variance in
- statistics.  The use of unbiased sampling leads unavoidably to
+statistics.  The use of unbiased sampling leads unavoidably to
 increased variance.
 
 Estimating sampling errors and variance is out of scope for this
-proposal.  We are satisfied the unbiased property, which guarantees
+proposal.  We are satisfied with the unbiased property, which guarantees
 that the expected value of totals derived from the sample equals the
 true population totals.  This means that statistics derived from
 sample data in this way are always accurate, and that more data will

From 7c66df0a233db656c0f329d44b5c9d5ec7b9ef60 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 6 Mar 2021 00:47:39 -0800
Subject: [PATCH 06/68] Recommended reading

---
 text/0148-sampling-probability.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 5bbf0e874..8eb7841f4 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -203,3 +203,8 @@ field's purpose.
 
 "Subset sum estimation" is the name given to this topic within the
 study of computer science and engineering.
+
+## Recommended reading
+
+Readers interested in more background on sampling consider 
+[Sampling, 3rd Edition, by Steven K. Thompson](https://www.wiley.com/en-us/Sampling%2C+3rd+Edition-p-9780470402313).

From e97eba723e2bd537ac2d2cb3f5a75a5d790b800d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 6 Mar 2021 01:03:49 -0800
Subject: [PATCH 07/68] 0

---
 text/0148-sampling-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 8eb7841f4..043ad4cb7 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -158,7 +158,7 @@ always improve precision.
 Rate-limiting stages in a telemetry collection pipeline interfere with
 sampling schemes when they operate in non-probabilistic ways.  When
 implementing a non-probabilistic form of rate-limiting, processors
-MUST set the `sample_count` to a NaN value.
+MUST set the `sample_count` to 0.
 
 ### No Sampler configured
 

From 9592846087f02bf86d5a23c087f804ba2107a0a8 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 6 Mar 2021 01:04:32 -0800
Subject: [PATCH 08/68] Edits

---
 text/0148-sampling-probability.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 043ad4cb7..c8d110bb6 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -168,13 +168,12 @@ perfect representivity, indicating that no sampling was performed.
 
 ### Applicability for tracing
 
-When using sampling to limit span collection, there are usually
-approaches under consideration.  The sampling approach covered here
-dictates how to select root spans in a probabilistic way.  When
-recording root spans, the `sample_count` field should be set as
-described above.  The adjusted `sample_count` of the root span applies
-the trace, meaning the trace should be considered as representative of
-`sample_count` traces in the population.
+The sampling approach covered here dictates how to select root spans
+in a probabilistic way.  When recording root spans, the `sample_count`
+field should be set as described above.  The adjusted `sample_count`
+of the root span applies the trace, meaning the trace should be
+considered as representative of `sample_count` traces in the
+population.
 
 When non-root spans are recorded because they are part of an ongoing
 trace, they are considered non-probabilistic exemplars.  Non-root

From 2691e7257dfdba99712cd3221e53d20b999a5bc5 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 6 Mar 2021 01:04:52 -0800
Subject: [PATCH 09/68] Zero

---
 text/0148-sampling-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index c8d110bb6..e6fd8dc7c 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -158,7 +158,7 @@ always improve precision.
 Rate-limiting stages in a telemetry collection pipeline interfere with
 sampling schemes when they operate in non-probabilistic ways.  When
 implementing a non-probabilistic form of rate-limiting, processors
-MUST set the `sample_count` to 0.
+MUST set the `sample_count` to zero.
 
 ### No Sampler configured
 

From b542d8974d9d53dcfae75bec4c022b2063b0c34d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Fri, 19 Mar 2021 23:41:22 -0700
Subject: [PATCH 10/68] Apply suggestions from code review

Co-authored-by: Paul Osman <paul@eval.ca>
Co-authored-by: Steven E. Harris <seh@panix.com>
---
 text/0148-sampling-probability.md | 38 ++++++++++++++++---------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index e6fd8dc7c..f4272a490 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -15,11 +15,11 @@ sampling designs.
 
 Consider a hypothetical telemetry signal in which an API event
 produces a unit of data that has one or more associated numbers.
-Using the OpenTelemetry Metrics data model terminolgy, we have two
+Using the OpenTelemetry Metrics data model terminology, we have two
 scenarios in which sampling is common.
 
-1. Counter events: Each event represents a count, signifying the change in a sum
-2. Histogram events: Each event represents an individual variable, signifying new membership in a distribution
+1. _Counter events:_ Each event represents a count, signifying the change in a sum.
+2. _Histogram events:_ Each event represents an individual variable, signifying new membership in a distribution.
 
 A Tracing Span event qualifies as both of these cases simultaneously.  It is
 a Counter event (of 1 span) and a Histogram event (of 1 latency measurement).
@@ -28,7 +28,7 @@ In Metrics, [Statsd Counter and Histogram events meet this definition](https://g
 
 In both cases, the goal in sampling is to estimate something about the
 population of all events, using only the events that were chosen in
-the sample.  Sampling theory defines various kinds of "estimator",
+the sample.  Sampling theory defines various kinds of "estimators"—
 algorithms for calculating statistics about the population using just
 the sample data.  For the broad class of telemetry sampling
 application considered here, we need an estimator for the population
@@ -52,10 +52,10 @@ sampled event to the inverse of the event's inclusion probability.
 Conveying the inverse of the inclusion probability is convenient for
 several reasons:
 
-- A `sample_count` of 1 indicates no sampling was performed
-- A `sample_count` of 0 indicates an unrepresentative event
-- Larger `sample_count` indicates greater representivity
-- Smaller `sample_count` indicates smaller representivity
+- A `sample_count` of 1 indicates no sampling was performed.
+- A `sample_count` of 0 indicates an unrepresentative event.
+- A larger `sample_count` indicates greater representativity.
+- A smaller `sample_count` indicates smaller representativity.
 - The sum of `sample_count` in a sample equals the expected 
   population size.
   
@@ -70,7 +70,7 @@ of the frame.
 
 Sampling is a powerful approach when used with event data that has
 been annotated with key-value attributes.  It is possible to select
-arbitrary subsets of the sampled data and use it to estimate the count
+arbitrary subsets of the sampled data and use each to estimate the count
 of arbitrary subsets of the population.
 
 To summarize, there is a widely applicable procedure for sampling
@@ -90,13 +90,13 @@ over the data.
 This proposal leads to three change requests that will be carried out in
 separate places in the OpenTelemetry specification.  These are:
 
-1. For tracing, the SpanData message type should be extended with 
+1. _For tracing:_ The SpanData message type should be extended with 
    the `sample_count` field defined above.
-2. For metrics aggregate data: Count information aggregated from
+2. _For metrics aggregate data:_ Count information aggregated from
    sample metric events will have floating point values in general.
    Histogram and Counter data must be able to support floating point
    values.
-3. For metrics raw events: Exemplars should be extended with the 
+3. _For metrics raw events:_ Exemplars should be extended with the 
    `sample_count` field defined above.
 
 ### Example: Dapper tracing
@@ -107,7 +107,9 @@ collection at scale.
 
 ### Example: Statsd metrics
 
-A Statsd counter event appears as a line of text, for example a metric
+A Statsd counter event appears as a line of text.
+
+For example, a metric
 named `name` is incremented by `increment` using a counter event (`c`)
 with the given `sample_rate`.
 
@@ -131,7 +133,7 @@ interpret this event as probabilistically equal to a count of `100/0.1
 
 The statistical foundation of this technique is known as the
 Horvitz-Thompson estimator ("A Generalization of Sampling Without
-Replacement From a Finite Universe", JSTOR 1952).  The
+Replacement From a Finite Universe," JSTOR 1952).  The
 Horvitz-Thompson technique works with _unequal probability sampling_
 designs, enabling a variety of techniques for controlling properties
 of the sample.  
@@ -164,14 +166,14 @@ MUST set the `sample_count` to zero.
 
 When no Sampler is in place and all telemetry events pass to the
 output, the `sample_count` field SHOULD be set to 1 to indicate
-perfect representivity, indicating that no sampling was performed.
+perfect representativity, indicating that no sampling was performed and that all events were preserved.
 
 ### Applicability for tracing
 
-The sampling approach covered here dictates how to select root spans
+The approach covered here dictates how to annotate root spans sampled
 in a probabilistic way.  When recording root spans, the `sample_count`
 field should be set as described above.  The adjusted `sample_count`
-of the root span applies the trace, meaning the trace should be
+of the root span applies to the entire trace, meaning the trace should be
 considered as representative of `sample_count` traces in the
 population.
 
@@ -194,7 +196,7 @@ The name `sample_count` is proposed because the resulting value is
 effectively a count and may be used in place of the exact count.
 
 Statsd conveys inclusion probability instead of `sample_count`, where
-it is often called "sample rate".
+it is often called "sample rate."
 
 Another name for the proposed `sample_count` field is
 `inverse_probability`, which is considered less suggestive of the

From c1ce969b12010381c54f7546a67bb345e561d7b2 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 4 May 2021 12:18:44 -0700
Subject: [PATCH 11/68] Revisions based on feedback

---
 text/0148-sampling-probability.md | 256 +++++++++++++++++++-----------
 1 file changed, 166 insertions(+), 90 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index e6fd8dc7c..5dbb477ea 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -4,12 +4,14 @@ Specify a foundation for sampling techniques in OpenTelemetry.
 
 ## Motivation
 
-In both tracing and metrics, there are widely known techniques for
-sampling events that, when performed correctly, enable ways to lower
-collection costs.  While sampling techniques vary, it is possible
-specify high-level interoperability requirements that producers and
-consumers of sampled data may follow to enable a wide range of
-sampling designs.
+In tracing, metrics, and logs, there are widely known techniques for
+sampling a stream of events that, when performed correctly, enable
+collecting a tiny fraction of data while maintaining substantial
+visibility into the whole population of events described by the data.
+
+While sampling techniques vary, it is possible specify high-level
+interoperability requirements that producers and consumers of sampled
+data may follow to enable a wide range of sampling designs.
 
 ## Explanation
 
@@ -21,65 +23,156 @@ scenarios in which sampling is common.
 1. Counter events: Each event represents a count, signifying the change in a sum
 2. Histogram events: Each event represents an individual variable, signifying new membership in a distribution
 
-A Tracing Span event qualifies as both of these cases simultaneously.  It is
-a Counter event (of 1 span) and a Histogram event (of 1 latency measurement).
+A Tracing Span event qualifies as both of these cases simultaneously.
+It is a Counter event (of 1 span) and at lesat one Histogram event
+(e.g., one of latency, one of request size).
 
 In Metrics, [Statsd Counter and Histogram events meet this definition](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#sampling).
 
 In both cases, the goal in sampling is to estimate something about the
 population of all events, using only the events that were chosen in
-the sample.  Sampling theory defines various kinds of "estimator",
+the sample.  Sampling theory defines various _sampling estimators_,
 algorithms for calculating statistics about the population using just
 the sample data.  For the broad class of telemetry sampling
 application considered here, we need an estimator for the population
 total represented by each individual event.
 
-### Sample count adjustment
-
-The estimated population total divided by the individual event value
-equals the event's _adjusted count_.  The adjusted count is named
-`sample_count` where it appears as a field in a telemetry data type.
-The field indicates that the individual sample event is estimated to
-represent `sample_count` number of identical events in the population.
-
-A standard sampling adjustment will be defined, and for it to work
-there is one essential requirement.  The selection procedure must be
-_unbiased_, a statistical term meaning that the process is expected to
-give equal consideration to all possible outcomes.
-
-The specified sampling adjustment sets the `sample_count` of each
-sampled event to the inverse of the event's inclusion probability.
-Conveying the inverse of the inclusion probability is convenient for
-several reasons:
-
-- A `sample_count` of 1 indicates no sampling was performed
-- A `sample_count` of 0 indicates an unrepresentative event
-- Larger `sample_count` indicates greater representivity
-- Smaller `sample_count` indicates smaller representivity
-- The sum of `sample_count` in a sample equals the expected 
-  population size.
-  
-The zero `sample_count` value supports collecting values that were
-rejected from a sample to use as exemplars, which supports encoding of
-values that should not be counted in an estimate of the population
-total.  This has applications to reservoir sampling designs, where
-events may be selected for a sample only to be rejected before the end
-of the frame.
+### Model and terminology
+
+This model is meant to apply in telemetry collection situations where
+individual events at an API boundary are sampled for collection.
+
+In sampling, the term _sampling design_ refers to how sampling
+probability is decided for a collection process and the term _sample
+frame_ refers to how events are organized into discrete populations.
+
+After executing a sampling design over a frame, each item selected in
+the sample will have known _inclusion probability_, that determines
+how likely it was to be selected.  Implicitly, all the items that were
+not selected for the sample have zero inclusion probability.
+
+Descriptive words that are often used to describe sampling designs:
+
+- *Fixed*: the sampling design is the same from one frame to the next
+- *Adaptive*: the sampling design changes from one frame to the next
+- *Equal-Probability*: the sampling design uses a single inclusion probability per frame
+- *Unequal-Probability*: the sampling design uses mulitple inclusion probabilities per frame
+- *Reservoir*: the sampling design uses fixed space, has fixed-size output.
+
+Our goal is to support flexibility in choosing sampling designs for
+producers of telemetry data, while allowing consumers of sampled
+telemetry data to be agnostic to the sampling design used.
+
+We are interested in the common case for telemetry collection, where
+sampling is performed while processing a stream of events and each
+event is considered just once.  Sampling designs of this form are
+referred to as _sampling without replacement_.  Unless stated
+otherwise, "sampling" in telemetry collection always refers to
+sampling without replacement.
+
+After executing a given sampling design over complete frame of data,
+the result is a set of selected sample events, each having known and
+non-zero inclusion probability.  There are several other quantities of
+interest, after calculating a sample from a sample frame.
+
+- *Sample size*: the number of events with non-zero inclusion probability
+- *True population total*: the exact number of events in the frame
+- *Estimated population total*: the estimated number of events in the frame
+
+The sample size is known after it is calculated, but the size may or
+may not be known ahead of time, depending on the design.  The true
+population total cannot be inferred directly from the sample, but can
+(sometimes) be counted separately.  The estimated population total is
+the expected value of the true population total.
+
+### Adjusted sample count
+
+Following the modle above, every event defines the notion of an
+_adjusted count_.
+
+- _Adjusted count_ is zero if the event was not selected for the sample
+- _Adjusted count_ is the reciprocal of its inclusion probability, otherwise.
+
+The adjusted count of an event represents the expected contribution to
+the estimated population total of a sample framer represented by the
+individual event.  As stated, the sample event's adjusted count is
+easily derived from the Horvitz-Thompson estimator of the population
+total, a general-purpose statistical estimator that applies to all
+_without replacement_ sampling designs.
+
+Assuming sample data is correctly computed, the consumer of sample
+data can treat every sample event as though an identical copy of
+itself has occurred _adjusted count_ times.  Every sample event is
+representative for adjusted count many copies of itself.
+
+There is one essential requirement for this to work.  The selection
+procedure must be _statistically unbiased_, a term meaning that the
+process is required to give equal consideration to all possible
+outcomes.
+
+### Encoding inclusion probability
+
+Some possibilities for encoding the inclusion probability, depending
+on the circumstances and the protocol, briefly discussed:
+
+1. Encode the adjusted count directly as a floating point or integer number in the range [0, +Inf).  This is a conceptually easy way to understand sampling because larger numbers mean greater representivity.
+2. Encode the inclusion probability directly as a floating point number in the range [0, 1).  This is typical of the Statsd format, where each line includes an optional probability.  In this context, the probability is commonly referred to as a "sampling rate".  In this case, smaller numbers mean greater representivity.
+3. Encode the negative of the base-2 logarithm of inclusion probability.  This restricts inclusion probabilities to powers of two and allows the use of small non-negative integers to encode power-of-two adjusted counts.
+4. Fold the adjusted count into the data.  This is appropriate when the data itself carries counts, such as for OTLP Metrics Sum and Histogram points encoded using delta aggregation temporality.  This may lead to rounding errors, when adjusted counts are not integer valued.
+
+This is not an exhaustive list of approaches.  All of these techniques
+are considered appropriate.
+
+A telemetry system should be able to accurately estimate the number of
+events that took place whether the events were sampled or not, which
+requires being able to recognize a zero value for the adjusted count
+as being distinct from a raw event where no sampling took place.
+
+#### Recognizing zero adjusted count
+
+An adjusted count of zero indicates an event that was recorded, where
+according to the sampling design its inclusion probability is zero.
+These events are may be included in a stream of sampled events as
+auxiliary information, and consu
+
+Recording events with zero adjusted count are a useful way to record
+auxiliary information while sampling, events that are considered
+interesting but which are accounted for by the adjusted count of other
+events in the same stream.
+
+Consider a span sampling design that applies a sampling decision only
+to the roots of a trace.  Non-root spans must be recorded when their
+parent span is selected for a sample.  For a class of spans that
+sometimes are and sometimes are not the root of a trace, we have three
+outcomes:
+
+1. Span is part of another trace: adjusted count is zero
+2. Span is a root, was selected for the sample: adjusted count is non-zero
+3. Span is a root, was not selected for the sample: not recorded.
 
 ### Sampling with attributes
 
 Sampling is a powerful approach when used with event data that has
-been annotated with key-value attributes.  It is possible to select
-arbitrary subsets of the sampled data and use it to estimate the count
-of arbitrary subsets of the population.
+been annotated with key-value attributes and sampled with an unbiased
+design.  It is possible to select arbitrary subsets use those subsets
+to estimate the count of arbitrary subsets of the population.
+
+This application for sample data is prescribed by the statement above,
+"Every sample event is representative for adjusted count many copies
+of itself."  It relies on the use of an unbiased sampling design.
+Readers are referred to [recommended reading](#recommended-reading)
+for more resources on sampling with attributes.
+
+### Summary: a general technique
 
 To summarize, there is a widely applicable procedure for sampling
 telemetry data from a population:
 
-- use an unbiased sampling algorithm to select telemetry events
-- label each event in the sample with its `sample_count` (i.e., inverse inclusion probability)
-- apply a predicate to events in the sample to select a subset
-- apply an estimator to the subset to estimate the sub-population total.
+- describe how to map telemetry events into discrete frames
+- use an unbiased sampling design to select events
+- encode the adjusted count or inclusion probability in the recorded events
+- apply a predicate to events in the sample to select a subset of events
+- sum the adjusted counts of the subset to estimate the sub-population total.
 
 Applied correctly, this approach provides accurate estimates for
 population counts and distributions with support for ad-hoc queries
@@ -87,17 +180,12 @@ over the data.
 
 ### Changes proposed
 
-This proposal leads to three change requests that will be carried out in
-separate places in the OpenTelemetry specification.  These are:
-
-1. For tracing, the SpanData message type should be extended with 
-   the `sample_count` field defined above.
-2. For metrics aggregate data: Count information aggregated from
-   sample metric events will have floating point values in general.
-   Histogram and Counter data must be able to support floating point
-   values.
-3. For metrics raw events: Exemplars should be extended with the 
-   `sample_count` field defined above.
+This OTEP proposes no formal changes in the OpenTelemetry
+specitication.  It is meant to lay a foundation for importing sampled
+telemetry events from other systems as well as to begin specifying how
+OpenTelemetry SDKs that use probabilistic `Sampler` implementations
+should convey inclusion probability and how consumers of this
+information can use information about sampling.
 
 ### Example: Dapper tracing
 
@@ -105,6 +193,15 @@ Google's [Dapper](https://research.google/pubs/pub36356/) tracing
 system describes the use of sampling to control the cost of trace
 collection at scale.
 
+The paper spends little time talking about Dapper's specific approach
+to sampling, which evolved over time.  Dapper made use of tracing
+context, similar to OpenTelemetry Baggage, to convey the probability
+that the current trace was selected for sampling.  This allowed each
+node in the trace to make an independent decision to begin sampling
+with themselves as a new root.  This technique can ensure a minimum
+rate of traces being started by every node in the system, however this
+is not described by the Dapper paper.
+
 ### Example: Statsd metrics
 
 A Statsd counter event appears as a line of text, for example a metric
@@ -122,7 +219,7 @@ random sampling scheme will arrive as:
 counter:100|c|@0.1
 ```
 
-Probability 0.1 leads to a `sample_count` of 10.  Assuming the sample
+Probability 0.1 leads to an adjusted count of 10.  Assuming the sample
 was selected using an unbiased algorithm (as by a "fair coin"), we can
 interpret this event as probabilistically equal to a count of `100/0.1
 = 1000`.
@@ -134,10 +231,10 @@ Horvitz-Thompson estimator ("A Generalization of Sampling Without
 Replacement From a Finite Universe", JSTOR 1952).  The
 Horvitz-Thompson technique works with _unequal probability sampling_
 designs, enabling a variety of techniques for controlling properties
-of the sample.  
+of the sample.
 
 For example, you can sample 100% of error events while sampling 1% of
-non-error events, and the interpretation of `sample_count` will be
+non-error events, and the interpretation of adjusted count will be
 correct.
 
 ### Bias, variance, and sampling errors
@@ -158,35 +255,11 @@ always improve precision.
 Rate-limiting stages in a telemetry collection pipeline interfere with
 sampling schemes when they operate in non-probabilistic ways.  When
 implementing a non-probabilistic form of rate-limiting, processors
-MUST set the `sample_count` to zero.
+MUST set the adjusted count to zero.
 
-### No Sampler configured
-
-When no Sampler is in place and all telemetry events pass to the
-output, the `sample_count` field SHOULD be set to 1 to indicate
-perfect representivity, indicating that no sampling was performed.
-
-### Applicability for tracing
-
-The sampling approach covered here dictates how to select root spans
-in a probabilistic way.  When recording root spans, the `sample_count`
-field should be set as described above.  The adjusted `sample_count`
-of the root span applies the trace, meaning the trace should be
-considered as representative of `sample_count` traces in the
-population.
-
-When non-root spans are recorded because they are part of an ongoing
-trace, they are considered non-probabilistic exemplars.  Non-root
-spans should have `sample_count` set to zero.
-
-### Applicability for metrics
-
-The use of sampling in metrics makes it possible to record
-high-cardinality metric events efficiently, as demonstrated by Statsd.
-
-By pushing sampled metric events from client to server, instead of
-timeseries, it is possible to defer decisions about cardinality
-reduction to the server, without unreasonable cost to the client.
+The use of zero adjusted count explicitly conveys that the events
+output by non-probabilistic sampling should not be counted in a
+statistical manner.
 
 ## Prior art and alternatives
 
@@ -205,5 +278,8 @@ study of computer science and engineering.
 
 ## Recommended reading
 
-Readers interested in more background on sampling consider 
 [Sampling, 3rd Edition, by Steven K. Thompson](https://www.wiley.com/en-us/Sampling%2C+3rd+Edition-p-9780470402313).
+
+[Performance Is A Shape. Cost Is A Number: Sampling](https://docs.lightstep.com/otel/performance-is-a-shape-cost-is-a-number-sampling), 2020 blog post, Joshua MacDonald
+
+[Priority sampling for estimation of arbitrary subset sums](https://dl.acm.org/doi/abs/10.1145/1314690.1314696)

From b9f55dfa2cc32f6a0606d7b8b684a172787ad2e3 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 4 May 2021 12:31:12 -0700
Subject: [PATCH 12/68] Edits

---
 text/0148-sampling-probability.md | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index d2ddfd104..b1322680a 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -269,15 +269,14 @@ statistical manner.
 
 ## Prior art and alternatives
 
-The name `sample_count` is proposed because the resulting value is
+The term "adjusted count" is proposed because the resulting value is
 effectively a count and may be used in place of the exact count.
 
-Statsd conveys inclusion probability instead of `sample_count`, where
-it is often called "sample rate."
+The term "adjusted weight" is NOT proposed to describe the adjustment
+made by sampling, because the adjustment being made is that of a count.
 
-Another name for the proposed `sample_count` field is
-`inverse_probability`, which is considered less suggestive of the
-field's purpose.
+Another term for the proposed "adjusted count" concept is
+`inverse_probability`.
 
 "Subset sum estimation" is the name given to this topic within the
 study of computer science and engineering.

From 14d86536624b1e212328e59d35f3b4729e4f235b Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 4 May 2021 12:35:06 -0700
Subject: [PATCH 13/68] Lint

---
 text/0148-sampling-probability.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index b1322680a..69524f902 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -56,7 +56,7 @@ Descriptive words that are often used to describe sampling designs:
 - *Fixed*: the sampling design is the same from one frame to the next
 - *Adaptive*: the sampling design changes from one frame to the next
 - *Equal-Probability*: the sampling design uses a single inclusion probability per frame
-- *Unequal-Probability*: the sampling design uses mulitple inclusion probabilities per frame
+- *Unequal-Probability*: the sampling design uses multiple inclusion probabilities per frame
 - *Reservoir*: the sampling design uses fixed space, has fixed-size output.
 
 Our goal is to support flexibility in choosing sampling designs for
@@ -87,7 +87,7 @@ the expected value of the true population total.
 
 ### Adjusted sample count
 
-Following the modle above, every event defines the notion of an
+Following the model above, every event defines the notion of an
 _adjusted count_.
 
 - _Adjusted count_ is zero if the event was not selected for the sample
@@ -164,7 +164,8 @@ Readers are referred to [recommended reading](#recommended-reading)
 for more resources on sampling with attributes.
 
 ### Summary: a general technique
-=======
+
+Sampling is a powerful approach when used with event data that has
 been annotated with key-value attributes.  It is possible to select
 arbitrary subsets of the sampled data and use each to estimate the count
 of arbitrary subsets of the population.

From 95c7285ec225be43ec13934a71ffa9f6232cee2c Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 22 May 2021 00:16:48 -0700
Subject: [PATCH 14/68] Update

---
 text/0148-sampling-probability.md | 82 ++++++++++++++-----------------
 1 file changed, 37 insertions(+), 45 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 69524f902..08d5094b0 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -24,7 +24,7 @@ scenarios in which sampling is common.
 2. _Histogram events:_ Each event represents an individual variable, signifying new membership in a distribution.
 
 A Tracing Span event qualifies as both of these cases simultaneously.
-It is a Counter event (of 1 span) and at lesat one Histogram event
+It is a Counter event (of 1 span) and at least one Histogram event
 (e.g., one of latency, one of request size).
 
 In Metrics, [Statsd Counter and Histogram events meet this definition](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#sampling).
@@ -94,7 +94,7 @@ _adjusted count_.
 - _Adjusted count_ is the reciprocal of its inclusion probability, otherwise.
 
 The adjusted count of an event represents the expected contribution to
-the estimated population total of a sample framer represented by the
+the estimated population total of a sample frame represented by the
 individual event.  As stated, the sample event's adjusted count is
 easily derived from the Horvitz-Thompson estimator of the population
 total, a general-purpose statistical estimator that applies to all
@@ -118,7 +118,7 @@ on the circumstances and the protocol, briefly discussed:
 1. Encode the adjusted count directly as a floating point or integer number in the range [0, +Inf).  This is a conceptually easy way to understand sampling because larger numbers mean greater representivity.
 2. Encode the inclusion probability directly as a floating point number in the range [0, 1).  This is typical of the Statsd format, where each line includes an optional probability.  In this context, the probability is commonly referred to as a "sampling rate".  In this case, smaller numbers mean greater representivity.
 3. Encode the negative of the base-2 logarithm of inclusion probability.  This restricts inclusion probabilities to powers of two and allows the use of small non-negative integers to encode power-of-two adjusted counts.
-4. Fold the adjusted count into the data.  This is appropriate when the data itself carries counts, such as for OTLP Metrics Sum and Histogram points encoded using delta aggregation temporality.  This may lead to rounding errors, when adjusted counts are not integer valued.
+4. Multiply the adjusted count into the data.  This is appropriate when the data itself carries counts, such as for OTLP Metrics Sum and Histogram points encoded using delta aggregation temporality.  This technique is less desireable because while it preserves the expected value of the count or sum, the data loses information about variance.  This may also lead to rounding errors, when adjusted counts are not integer valued. 
 
 This is not an exhaustive list of approaches.  All of these techniques
 are considered appropriate.
@@ -132,52 +132,32 @@ as being distinct from a raw event where no sampling took place.
 
 An adjusted count of zero indicates an event that was recorded, where
 according to the sampling design its inclusion probability is zero.
-These events are may be included in a stream of sampled events as
-auxiliary information, and consu
 
 Recording events with zero adjusted count are a useful way to record
 auxiliary information while sampling, events that are considered
 interesting but which are accounted for by the adjusted count of other
 events in the same stream.
 
-Consider a span sampling design that applies a sampling decision only
-to the roots of a trace.  Non-root spans must be recorded when their
-parent span is selected for a sample.  For a class of spans that
-sometimes are and sometimes are not the root of a trace, we have three
-outcomes:
-
-1. Span is part of another trace: adjusted count is zero
-2. Span is a root, was selected for the sample: adjusted count is non-zero
-3. Span is a root, was not selected for the sample: not recorded.
-
 ### Sampling with attributes
 
 Sampling is a powerful approach when used with event data that has
 been annotated with key-value attributes and sampled with an unbiased
-design.  It is possible to select arbitrary subsets use those subsets
-to estimate the count of arbitrary subsets of the population.
-
-This application for sample data is prescribed by the statement above,
-"Every sample event is representative for adjusted count many copies
-of itself."  It relies on the use of an unbiased sampling design.
-Readers are referred to [recommended reading](#recommended-reading)
-for more resources on sampling with attributes.
-
-### Summary: a general technique
+design.
 
-Sampling is a powerful approach when used with event data that has
-been annotated with key-value attributes.  It is possible to select
-arbitrary subsets of the sampled data and use each to estimate the count
-of arbitrary subsets of the population.
+Because an individual item is considered representative for _adjusted
+count_ many copies of itself, it is possible to select arbitrary
+subsets from a sample to estimate the count of arbitrary subsets of
+the population.
 
-To summarize, there is a widely applicable procedure for sampling
-telemetry data from a population:
+Readers are referred to [recommended reading](#recommended-reading)
+for more resources on sampling with attributes.  To summarize, there
+is a widely applicable procedure for sampling telemetry data from a
+population:
 
-- describe how to map telemetry events into discrete frames
-- use an unbiased sampling design to select events
-- encode the adjusted count or inclusion probability in the recorded events
-- apply a predicate to events in the sample to select a subset of events
-- sum the adjusted counts of the subset to estimate the sub-population total.
+- use an unbiased sampling algorithm
+- encode the adjusted count or inclusion probability in or alongside sampled events
+- apply a predicate to events in the sample to select a subset
+- sum the adjusted counts in the subset to estimate the sub-population total.
 
 Applied correctly, this approach provides accurate estimates for
 population counts and distributions with support for ad-hoc queries
@@ -186,7 +166,7 @@ over the data.
 ### Changes proposed
 
 This OTEP proposes no formal changes in the OpenTelemetry
-specitication.  It is meant to lay a foundation for importing sampled
+specification.  It is meant to lay a foundation for importing sampled
 telemetry events from other systems as well as to begin specifying how
 OpenTelemetry SDKs that use probabilistic `Sampler` implementations
 should convey inclusion probability and how consumers of this
@@ -198,14 +178,26 @@ Google's [Dapper](https://research.google/pubs/pub36356/) tracing
 system describes the use of sampling to control the cost of trace
 collection at scale.
 
-The paper spends little time talking about Dapper's specific approach
-to sampling, which evolved over time.  Dapper made use of tracing
-context, similar to OpenTelemetry Baggage, to convey the probability
-that the current trace was selected for sampling.  This allowed each
-node in the trace to make an independent decision to begin sampling
-with themselves as a new root.  This technique can ensure a minimum
-rate of traces being started by every node in the system, however this
-is not described by the Dapper paper.
+Dapper's used a sampling approach where:
+
+- Root nodes in a trace use simple random sampling to decide to trace at the root
+- Propagate the tracing decision the inclusion probability into child contexts
+- Allow child contexts to boost the sampling probability of their sub-rooted trace.
+
+Allowing contexts to boost sampling probability addresses a scenario
+where a high-throughput service that sampled with low probabiliy
+rarely calls another, low-throughput service.  For the low-throughput
+service to record a sufficient number of traces, it has to increase
+its own odds of sampling.
+
+This requires propagating the inclusion probability used when a
+negative sampling decision is made, as the child needs it to calculate
+a conditional probability for its own sampling decision.
+
+The specific details of this approach are considered out-of-scope for
+this text, however the result is an adjusted count on every span
+making it easy for consumers to compute metrics from a stream of
+sampled spans without having to assemble compelte traces first.
 
 ### Example: Statsd metrics
 

From 2e683a5d56e5b740893aa7c2da814d720b7c931b Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 22 May 2021 00:27:24 -0700
Subject: [PATCH 15/68] Typo

---
 text/0148-sampling-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 08d5094b0..9ad1818aa 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -195,9 +195,9 @@ negative sampling decision is made, as the child needs it to calculate
 a conditional probability for its own sampling decision.
 
 The specific details of this approach are considered out-of-scope for
-this text, however the result is an adjusted count on every span
+this text, however the result is an adjusted count on every span,
 making it easy for consumers to compute metrics from a stream of
-sampled spans without having to assemble compelte traces first.
+sampled spans without having to assemble complete traces first.
 
 ### Example: Statsd metrics
 

From 72a70081c170b267e41a6ab9a2ac9f04a91eb520 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sat, 22 May 2021 00:30:09 -0700
Subject: [PATCH 16/68] lint

---
 text/0148-sampling-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 9ad1818aa..5aca21e22 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -118,7 +118,7 @@ on the circumstances and the protocol, briefly discussed:
 1. Encode the adjusted count directly as a floating point or integer number in the range [0, +Inf).  This is a conceptually easy way to understand sampling because larger numbers mean greater representivity.
 2. Encode the inclusion probability directly as a floating point number in the range [0, 1).  This is typical of the Statsd format, where each line includes an optional probability.  In this context, the probability is commonly referred to as a "sampling rate".  In this case, smaller numbers mean greater representivity.
 3. Encode the negative of the base-2 logarithm of inclusion probability.  This restricts inclusion probabilities to powers of two and allows the use of small non-negative integers to encode power-of-two adjusted counts.
-4. Multiply the adjusted count into the data.  This is appropriate when the data itself carries counts, such as for OTLP Metrics Sum and Histogram points encoded using delta aggregation temporality.  This technique is less desireable because while it preserves the expected value of the count or sum, the data loses information about variance.  This may also lead to rounding errors, when adjusted counts are not integer valued. 
+4. Multiply the adjusted count into the data.  This is appropriate when the data itself carries counts, such as for OTLP Metrics Sum and Histogram points encoded using delta aggregation temporality.  This technique is less desirable because while it preserves the expected value of the count or sum, the data loses information about variance.  This may also lead to rounding errors, when adjusted counts are not integer valued.
 
 This is not an exhaustive list of approaches.  All of these techniques
 are considered appropriate.
@@ -185,7 +185,7 @@ Dapper's used a sampling approach where:
 - Allow child contexts to boost the sampling probability of their sub-rooted trace.
 
 Allowing contexts to boost sampling probability addresses a scenario
-where a high-throughput service that sampled with low probabiliy
+where a high-throughput service that sampled with low probability
 rarely calls another, low-throughput service.  For the low-throughput
 service to record a sufficient number of traces, it has to increase
 its own odds of sampling.

From 58b78647072499ae811e7f2d2e192940f30941cc Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 25 May 2021 14:06:34 -0700
Subject: [PATCH 17/68] Add motivation: it's about approximate counting

---
 text/0148-sampling-probability.md | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 5aca21e22..78bfca952 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -6,12 +6,18 @@ Specify a foundation for sampling techniques in OpenTelemetry.
 
 In tracing, metrics, and logs, there are widely known techniques for
 sampling a stream of events that, when performed correctly, enable
-collecting a tiny fraction of data while maintaining substantial
-visibility into the whole population of events described by the data.
+collecting a tiny fraction of the complete data while maintaining
+substantial visibility into the whole population of events.
 
-While sampling techniques vary, it is possible specify high-level
+These techniques are all forms of approximate counting.  Estimates
+calculated by the forms of sampling outlined here are considered
+accurate, they are random variables with an expected value equal to
+the true value.  With sampling we expected to introduce variance,
+which can be compensated for with a sufficient quantity of data.
+
+While sampling techniques vary, it is possible to specify high-level
 interoperability requirements that producers and consumers of sampled
-data may follow to enable a wide range of sampling designs.
+data can follow to enable a wide range of sampling designs.
 
 ## Explanation
 

From c40b426091877cfbdc89a72d63ab2d2eebba5615 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 26 May 2021 00:36:59 -0700
Subject: [PATCH 18/68] edits

---
 text/0148-sampling-probability.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 78bfca952..c08b5b2be 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -50,7 +50,8 @@ individual events at an API boundary are sampled for collection.
 
 In sampling, the term _sampling design_ refers to how sampling
 probability is decided for a collection process and the term _sample
-frame_ refers to how events are organized into discrete populations.
+frame_ refers to how events are organized into discrete populations
+(e.g., a window in time, a particular span or metric name).
 
 After executing a sampling design over a frame, each item selected in
 the sample will have known _inclusion probability_, that determines
@@ -82,10 +83,10 @@ non-zero inclusion probability.  There are several other quantities of
 interest, after calculating a sample from a sample frame.
 
 - *Sample size*: the number of events with non-zero inclusion probability
-- *True population total*: the exact number of events in the frame
-- *Estimated population total*: the estimated number of events in the frame
+- *True population total*: the exact number of events in the frame, which may be unknown
+- *Estimated population total*: the estimated number of events in the frame, which is computed from the same.
 
-The sample size is known after it is calculated, but the size may or
+The sample size is always known after it is calculated, but the size may or
 may not be known ahead of time, depending on the design.  The true
 population total cannot be inferred directly from the sample, but can
 (sometimes) be counted separately.  The estimated population total is
@@ -139,10 +140,10 @@ as being distinct from a raw event where no sampling took place.
 An adjusted count of zero indicates an event that was recorded, where
 according to the sampling design its inclusion probability is zero.
 
-Recording events with zero adjusted count are a useful way to record
-auxiliary information while sampling, events that are considered
+Recording events with zero adjusted count is considered a useful way to record
+auxiliary information while sampling, for example, to encode events that are considered
 interesting but which are accounted for by the adjusted count of other
-events in the same stream.
+events in the same stream.  
 
 ### Sampling with attributes
 

From c3718f880e684de5af7cfb7556a2c8b8b1b50959 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 26 May 2021 23:14:13 -0700
Subject: [PATCH 19/68] four sections

---
 text/0148-sampling-probability.md | 78 ++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 26 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index c08b5b2be..48fc6e751 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -6,14 +6,13 @@ Specify a foundation for sampling techniques in OpenTelemetry.
 
 In tracing, metrics, and logs, there are widely known techniques for
 sampling a stream of events that, when performed correctly, enable
-collecting a tiny fraction of the complete data while maintaining
+collecting a fraction of the complete data while maintaining
 substantial visibility into the whole population of events.
 
 These techniques are all forms of approximate counting.  Estimates
 calculated by the forms of sampling outlined here are considered
-accurate, they are random variables with an expected value equal to
-the true value.  With sampling we expected to introduce variance,
-which can be compensated for with a sufficient quantity of data.
+accurate, in that they are random variables with an expected value
+equal to the true count.  With enough observations, 
 
 While sampling techniques vary, it is possible to specify high-level
 interoperability requirements that producers and consumers of sampled
@@ -36,12 +35,8 @@ It is a Counter event (of 1 span) and at least one Histogram event
 In Metrics, [Statsd Counter and Histogram events meet this definition](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#sampling).
 
 In both cases, the goal in sampling is to estimate something about the
-population of all events, using only the events that were chosen in
-the sample.  Sampling theory defines various _sampling estimators_,
-algorithms for calculating statistics about the population using just
-the sample data.  For the broad class of telemetry sampling
-application considered here, we need an estimator for the population
-total represented by each individual event.
+population, meaning all the events, using only the events that were
+selected in the sample.
 
 ### Model and terminology
 
@@ -49,14 +44,16 @@ This model is meant to apply in telemetry collection situations where
 individual events at an API boundary are sampled for collection.
 
 In sampling, the term _sampling design_ refers to how sampling
-probability is decided for a collection process and the term _sample
-frame_ refers to how events are organized into discrete populations
-(e.g., a window in time, a particular span or metric name).
+probability is decided and the term _sample frame_ refers to how
+events are organized into discrete populations.  
+
+For example, a simple design uses uniform probability, and a simple
+framing technique is to collect one sample per distinct span name.
 
 After executing a sampling design over a frame, each item selected in
 the sample will have known _inclusion probability_, that determines
-how likely it was to be selected.  Implicitly, all the items that were
-not selected for the sample have zero inclusion probability.
+how likely the item was to be selected.  Implicitly, all the items
+that were not selected for the sample have zero inclusion probability.
 
 Descriptive words that are often used to describe sampling designs:
 
@@ -70,14 +67,16 @@ Our goal is to support flexibility in choosing sampling designs for
 producers of telemetry data, while allowing consumers of sampled
 telemetry data to be agnostic to the sampling design used.
 
-We are interested in the common case for telemetry collection, where
+### Sampling without replacement
+
+We are interested in the common case in telemetry collection, where
 sampling is performed while processing a stream of events and each
 event is considered just once.  Sampling designs of this form are
 referred to as _sampling without replacement_.  Unless stated
 otherwise, "sampling" in telemetry collection always refers to
 sampling without replacement.
 
-After executing a given sampling design over complete frame of data,
+After executing a given sampling design over a complete frame of data,
 the result is a set of selected sample events, each having known and
 non-zero inclusion probability.  There are several other quantities of
 interest, after calculating a sample from a sample frame.
@@ -86,11 +85,10 @@ interest, after calculating a sample from a sample frame.
 - *True population total*: the exact number of events in the frame, which may be unknown
 - *Estimated population total*: the estimated number of events in the frame, which is computed from the same.
 
-The sample size is always known after it is calculated, but the size may or
-may not be known ahead of time, depending on the design.  The true
-population total cannot be inferred directly from the sample, but can
-(sometimes) be counted separately.  The estimated population total is
-the expected value of the true population total.
+The sample size is always known after it is calculated, but the size
+may or may not be known ahead of time, depending on the design.
+Probabilistic sampling schemes require that the estimated population
+total equals the expected value of the true population total.
 
 ### Adjusted sample count
 
@@ -102,10 +100,16 @@ _adjusted count_.
 
 The adjusted count of an event represents the expected contribution to
 the estimated population total of a sample frame represented by the
-individual event.  As stated, the sample event's adjusted count is
-easily derived from the Horvitz-Thompson estimator of the population
-total, a general-purpose statistical estimator that applies to all
-_without replacement_ sampling designs.
+individual event.
+
+The use of a reciprocal inclusion probability matches our intuition
+for probabilities.  Items selected with "one-out-of-N" probability of
+inclusion count for N each, approximately speaking.
+
+This intuition is backed up with statistics.  This equation is known
+as the Horvitz-Thompson estimator of the population total, a
+general-purpose statistical "estimator" that applies to all _without
+replacement_ sampling designs.
 
 Assuming sample data is correctly computed, the consumer of sample
 data can treat every sample event as though an identical copy of
@@ -117,6 +121,28 @@ procedure must be _statistically unbiased_, a term meaning that the
 process is required to give equal consideration to all possible
 outcomes.
 
+### Introducing variance
+
+The use of unbiased sampling outlined above makes it possible to
+estimate the population total for arbitrary subsets of the sample, as
+every individual sample has been independently assigned an adjusted
+count.
+
+There is a natural relationship between statistical bias and variance.
+Approximate counting comes with variance, a matter of fact which can
+be controlled for by the sample size.  Variance is unavoidable in an
+unbiased sample, but it vanishes when you have enough data.
+
+Although this makes it sounds like small sample sizes are a problem,
+due to expected high variance, this is just a limitation of the
+technique.  When variance is high, use a larger sample size.
+
+An easy approach for lowering variance is to aggregate sample frames
+together across time.  For example, although the estimates drawn from
+a one-minute sample may have high variance, combining an hour of
+one-minute sample frames into an aggregate data set is guaranteed to
+lower variance.  It must, because it remains unbiased.
+
 ### Encoding inclusion probability
 
 Some possibilities for encoding the inclusion probability, depending

From ca6ba5f04a001373eac15a27d8492df11d77818f Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 28 May 2021 00:56:32 -0700
Subject: [PATCH 20/68] add detail on dapper sampler

---
 text/0148-sampling-probability.md | 272 ++++++++++++++++++------------
 1 file changed, 166 insertions(+), 106 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 48fc6e751..e8194ab30 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -141,104 +141,186 @@ An easy approach for lowering variance is to aggregate sample frames
 together across time.  For example, although the estimates drawn from
 a one-minute sample may have high variance, combining an hour of
 one-minute sample frames into an aggregate data set is guaranteed to
-lower variance.  It must, because it remains unbiased.
+lower variance.  It must, because the data remains unbiased.
 
-### Encoding inclusion probability
+### Conveying the sampling probability
 
-Some possibilities for encoding the inclusion probability, depending
-on the circumstances and the protocol, briefly discussed:
+Some possibilities for encoding the adjusted count or inclusion
+probability are discussed below, depending on the circumstances and
+the protocol.
 
-1. Encode the adjusted count directly as a floating point or integer number in the range [0, +Inf).  This is a conceptually easy way to understand sampling because larger numbers mean greater representivity.
-2. Encode the inclusion probability directly as a floating point number in the range [0, 1).  This is typical of the Statsd format, where each line includes an optional probability.  In this context, the probability is commonly referred to as a "sampling rate".  In this case, smaller numbers mean greater representivity.
-3. Encode the negative of the base-2 logarithm of inclusion probability.  This restricts inclusion probabilities to powers of two and allows the use of small non-negative integers to encode power-of-two adjusted counts.
-4. Multiply the adjusted count into the data.  This is appropriate when the data itself carries counts, such as for OTLP Metrics Sum and Histogram points encoded using delta aggregation temporality.  This technique is less desirable because while it preserves the expected value of the count or sum, the data loses information about variance.  This may also lead to rounding errors, when adjusted counts are not integer valued.
+There are several ways of encoding this information:
 
-This is not an exhaustive list of approaches.  All of these techniques
-are considered appropriate.
+- as a dedicated field in an OTLP protobuf message
+- as a non-descriptive Attribute in an OTLP Span, Metric, or Log
+- without any dedicated field.
 
-A telemetry system should be able to accurately estimate the number of
-events that took place whether the events were sampled or not, which
-requires being able to recognize a zero value for the adjusted count
-as being distinct from a raw event where no sampling took place.
+#### Encoding adjusted count
 
-#### Recognizing zero adjusted count
+We can encode the adjusted count directly as a floating point or
+integer number in the range [0, +Inf).  This is a conceptually easy
+way to understand sampling because larger numbers mean greater
+representivity.
 
-An adjusted count of zero indicates an event that was recorded, where
-according to the sampling design its inclusion probability is zero.
+#### Encoding inclusion probability
 
-Recording events with zero adjusted count is considered a useful way to record
-auxiliary information while sampling, for example, to encode events that are considered
-interesting but which are accounted for by the adjusted count of other
-events in the same stream.  
+We can encode the inclusion probability directly as a floating point
+number in the range [0, 1).  This is typical of the Statsd format,
+where each line includes an optional probability.  In this context,
+the probability is also commonly referred to as a "sampling rate".  In
+this case, smaller numbers mean greater representivity.
 
-### Sampling with attributes
+#### Encoding negative base-2 logarithm of inclusion probability
 
-Sampling is a powerful approach when used with event data that has
-been annotated with key-value attributes and sampled with an unbiased
-design.
+We can encode the negative base-2 logarithm of inclusion probability.
+This restricts inclusion probabilities to powers of two and allows the
+use of small non-negative integers to encode power-of-two adjusted
+counts.  In this case, larger numbers mean exponentially greater
+representivity.
 
-Because an individual item is considered representative for _adjusted
-count_ many copies of itself, it is possible to select arbitrary
-subsets from a sample to estimate the count of arbitrary subsets of
-the population.
+#### Multiply the adjusted count into the data
 
-Readers are referred to [recommended reading](#recommended-reading)
-for more resources on sampling with attributes.  To summarize, there
-is a widely applicable procedure for sampling telemetry data from a
-population:
+When the data itself carries counts, such as for the Metrics Sum and
+Histogram points (encoded using delta aggregation temporality), we can
+fold the adjusted count into the data itself.
 
-- use an unbiased sampling algorithm
-- encode the adjusted count or inclusion probability in or alongside sampled events
-- apply a predicate to events in the sample to select a subset
-- sum the adjusted counts in the subset to estimate the sub-population total.
+This technique is less desirable because while it preserves the
+expected value of the count or sum, the data loses information about
+variance.  This may also lead to rounding errors, when adjusted counts
+are not integer valued.
 
-Applied correctly, this approach provides accurate estimates for
-population counts and distributions with support for ad-hoc queries
-over the data.
+### Tracing considerations
 
-### Changes proposed
+In tracing specifically, a number of additional concerns arise which
+deserve special treatment.  Often, the primary concern when sampling
+traces is to ensure that traces are complete, meaning that all spans
+in the graph of parent-child relationships have been collected.
 
-This OTEP proposes no formal changes in the OpenTelemetry
-specification.  It is meant to lay a foundation for importing sampled
-telemetry events from other systems as well as to begin specifying how
-OpenTelemetry SDKs that use probabilistic `Sampler` implementations
-should convey inclusion probability and how consumers of this
-information can use information about sampling.
+#### `Parent` Sampler
 
-### Example: Dapper tracing
+It is possible for a decision at the root of a trace to propagate
+throughout the trace using the [W3C Trace Context is-sampled
+flag](https://www.w3.org/TR/trace-context/#sampled-flag).  The
+inclusion probability of all spans in a trace is determined by the
+root tracing decision.
+
+The `Parent` Sampler ensures complete traces, provided all spans are
+successfully recorded.  A downside of `Parent` sampling is that it
+takes away control of Tracer overhead from non-roots in the trace.
+
+#### `TraceIDRatio` Sampler
+
+The OpenTelemetry tracing specification includes a built-in Sampler
+designed for probability sampling using a deterministic sampling
+decision based on the TraceID.  This Sampler was not finished before
+the OpenTelemetry version 1.0 specification was released; it was left
+in place, with [a TODO and the recommendation to use it only for trace
+roots](https://github.com/open-telemetry/opentelemetry-specification/issues/1413).
+[OTEP 135 proposed a solution]()
+
+The goal of the `TraceIDRatio` Sampler is to coordinate the tracing
+decision, but give each service control over Tracer overhead.  Each
+service sets its sampling probability independently, and the
+coordinated decision ensures that some traces will be complete.
+Traces are complete when the TraceID ratio falls below the minimum
+Sampler probability across the whole trace.
+
+The `TraceIDRatio` Sampler has another difficulty with testing for
+completeness.  It is impossible to know whether there are missing leaf
+spans in a trace without using external information.  One approach,
+[lost in the transition from OpenCensus to OpenTelemetry is to count
+the number of children of each
+span](https://github.com/open-telemetry/opentelemetry-specification/issues/355).
+
+Lacking the number of expected children, we require a way to know the
+minimum Sampler probability across traces to ensure they are complete.
+
+#### Dapper's "Inflationary" Sampler
 
 Google's [Dapper](https://research.google/pubs/pub36356/) tracing
 system describes the use of sampling to control the cost of trace
-collection at scale.
+collection at scale.  Dapper's early Sampler algorithms, referred to
+as "inflationary" approach (although not published in the paper), is
+reproduced here.
+
+This kind of Sampler allows non-root spans in a trace to raise the
+probability of tracing, using a conditional probability formula shown
+below.  Traces produced in this way are complete sub-trees, not
+necessarily complete.  This technique is succesful especially in
+systems where a high-throughput service on occasion calls a
+low-throughput service.  Low-throughput services are meant to inflate
+their sampling probability.
+
+The use of this technique requires propagating the inclusion
+probability of the incoming Context and whether it was sampled, in
+order to calculate the probability of starting to sample a new
+"sub-root" in the trace.
+
+Using standard notation for conditional probability, `P(x)` indicates
+the probability of `x` being true, and `P(x|y)` indicates the
+probability of `x` being true given that `y` is true.  The axioms of
+probability establish that:
+
+```
+P(x)=P(x|y)*P(y)+P(x|not y)*P(not y)
+```
+
+The variables are:
+
+- **`C`**: The sampling probability of the parent context that is in
+  effect, independent of whether the parent context was sampled.
+- **`I`**: The inflationary sampling probability for the span being
+  started.
+- **`D`**: The decision probability for whether to start a new sub-root.
+
+This Sampler cannot lower sampling probability, so if the new span is
+started with `C >= I` or when the context is already sampled, no new
+sampling decisions are made.  If the incoming context is already
+sampled, the adjusted count of the new span is `1/C`.
+
+Assuming `C < I` and the incoming context was not sampled, we have the
+following probability equations:
+
+```
+P(span sampled) = I
+P(parent sampled) = C
+P(span sampled | parent sampled) = 1
+P(span sampled | parent not sampled) = D
+```
+
+Using the formula above, 
+
+```
+I = 1*C + D*(1-C)
+```
+
+solve for D:
 
-Dapper's used a sampling approach where:
+```
+D = (I - C) / (1 - C)
+```
 
-- Root nodes in a trace use simple random sampling to decide to trace at the root
-- Propagate the tracing decision the inclusion probability into child contexts
-- Allow child contexts to boost the sampling probability of their sub-rooted trace.
+Now the Sampler makes a decision with probability `D`.  Whether the
+decision is true or false, propagate the inflationary probability `I`
+as the new parent context sampling probability.  If the decision is
+true, begin sampling a sub-rooted trace with adjusted count `1/I`.
 
-Allowing contexts to boost sampling probability addresses a scenario
-where a high-throughput service that sampled with low probability
-rarely calls another, low-throughput service.  For the low-throughput
-service to record a sufficient number of traces, it has to increase
-its own odds of sampling.
+According to current accounts, this Sampler is no longer used at
+Google.
 
-This requires propagating the inclusion probability used when a
-negative sampling decision is made, as the child needs it to calculate
-a conditional probability for its own sampling decision.
+### Non-Tracing applications
 
-The specific details of this approach are considered out-of-scope for
-this text, however the result is an adjusted count on every span,
-making it easy for consumers to compute metrics from a stream of
-sampled spans without having to assemble complete traces first.
+TODO: sampling for just spans or metrics, logs
+TODO: weighted sampling
+TODO: tail sampling
 
-### Example: Statsd metrics
+#### StatsD metric events
 
-A Statsd counter event appears as a line of text.
+A Statsd counter event appears as a line of text, describing a
+number-valued event with optional attributes and sample rate.
 
-For example, a metric
-named `name` is incremented by `increment` using a counter event (`c`)
-with the given `sample_rate`.
+For example, a metric named `name` is incremented by `increment` using
+a counter event (`c`) with the given `sample_rate`.
 
 ```
 name:increment|c|@sample_rate
@@ -252,46 +334,16 @@ counter:100|c|@0.1
 ```
 
 Probability 0.1 leads to an adjusted count of 10.  Assuming the sample
-was selected using an unbiased algorithm (as by a "fair coin"), we can
-interpret this event as probabilistically equal to a count of `100/0.1
-= 1000`.
+was selected using an unbiased algorithm, we can interpret this event
+as having an expected value of `100/0.1 = 1000`.
 
-## Internal details
+#### Exemplars with adjusted counts
 
-The statistical foundation of this technique is known as the
-Horvitz-Thompson estimator ("A Generalization of Sampling Without
-Replacement From a Finite Universe," JSTOR 1952).  The
-Horvitz-Thompson technique works with _unequal probability sampling_
-designs, enabling a variety of techniques for controlling properties
-of the sample.
+TODO
 
-For example, you can sample 100% of error events while sampling 1% of
-non-error events, and the interpretation of adjusted count will be
-correct.
+#### Cardinality reduction for metric events
 
-### Bias, variance, and sampling errors
-
-There is a fundamental tradeoff between bias and variance in
-statistics.  The use of unbiased sampling leads unavoidably to
-increased variance.
-
-Estimating sampling errors and variance is out of scope for this
-proposal.  We are satisfied with the unbiased property, which guarantees
-that the expected value of totals derived from the sample equals the
-true population totals.  This means that statistics derived from
-sample data in this way are always accurate, and that more data will
-always improve precision.
-
-### Non-probabilistic rate-limiters
-
-Rate-limiting stages in a telemetry collection pipeline interfere with
-sampling schemes when they operate in non-probabilistic ways.  When
-implementing a non-probabilistic form of rate-limiting, processors
-MUST set the adjusted count to zero.
-
-The use of zero adjusted count explicitly conveys that the events
-output by non-probabilistic sampling should not be counted in a
-statistical manner.
+TODO
 
 ## Prior art and alternatives
 
@@ -314,3 +366,11 @@ study of computer science and engineering.
 [Performance Is A Shape. Cost Is A Number: Sampling](https://docs.lightstep.com/otel/performance-is-a-shape-cost-is-a-number-sampling), 2020 blog post, Joshua MacDonald
 
 [Priority sampling for estimation of arbitrary subset sums](https://dl.acm.org/doi/abs/10.1145/1314690.1314696)
+
+[A Generalization of Sampling Without Replacement From a Finite Universe](https://www.jstor.org/stable/2280784), JSTOR (1952)
+
+## Acknowledgements
+
+Thanks to [Neena Dugar](https://github.com/neena) and [Alex
+Kehlenbeck](https://github.com/akehlenbeck) for reconstructing the
+Dapper Sampler algorithm.

From 956de22be0850339e28b3b9d8e218dc60eb86e8a Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 28 May 2021 09:20:54 -0700
Subject: [PATCH 21/68] shuffle sections

---
 text/0148-sampling-probability.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index e8194ab30..4c98f8790 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -189,7 +189,7 @@ expected value of the count or sum, the data loses information about
 variance.  This may also lead to rounding errors, when adjusted counts
 are not integer valued.
 
-### Tracing considerations
+### Tracing Samplers
 
 In tracing specifically, a number of additional concerns arise which
 deserve special treatment.  Often, the primary concern when sampling
@@ -305,14 +305,14 @@ decision is true or false, propagate the inflationary probability `I`
 as the new parent context sampling probability.  If the decision is
 true, begin sampling a sub-rooted trace with adjusted count `1/I`.
 
-According to current accounts, this Sampler is no longer used at
+According to current statements, this Sampler is no longer used at
 Google.
 
 ### Non-Tracing applications
 
-TODO: sampling for just spans or metrics, logs
-TODO: weighted sampling
-TODO: tail sampling
+TODO: Counting spans is critical. Want this done before trace assembly.
+
+TODO: Sampling for just spans or metrics, logs.
 
 #### StatsD metric events
 

From d07a834fa1dc164365b1adb11a984a0952ead8e8 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 28 May 2021 09:24:46 -0700
Subject: [PATCH 22/68] typo

---
 text/0148-sampling-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 4c98f8790..8535754e5 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -239,8 +239,8 @@ minimum Sampler probability across traces to ensure they are complete.
 
 Google's [Dapper](https://research.google/pubs/pub36356/) tracing
 system describes the use of sampling to control the cost of trace
-collection at scale.  Dapper's early Sampler algorithms, referred to
-as "inflationary" approach (although not published in the paper), is
+collection at scale.  Dapper's early Sampler algorithm, referred to as
+an "inflationary" approach (although not published in the paper), is
 reproduced here.
 
 This kind of Sampler allows non-root spans in a trace to raise the

From f934086890b6e3d86c78b38cdd062b8b0121d0b9 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 1 Jun 2021 01:20:08 -0700
Subject: [PATCH 23/68] reorg head samplers

---
 text/0148-sampling-probability.md | 36 +++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 8535754e5..95ee20125 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -67,7 +67,7 @@ Our goal is to support flexibility in choosing sampling designs for
 producers of telemetry data, while allowing consumers of sampled
 telemetry data to be agnostic to the sampling design used.
 
-### Sampling without replacement
+#### Sampling without replacement
 
 We are interested in the common case in telemetry collection, where
 sampling is performed while processing a stream of events and each
@@ -90,7 +90,7 @@ may or may not be known ahead of time, depending on the design.
 Probabilistic sampling schemes require that the estimated population
 total equals the expected value of the true population total.
 
-### Adjusted sample count
+#### Adjusted sample count
 
 Following the model above, every event defines the notion of an
 _adjusted count_.
@@ -121,7 +121,7 @@ procedure must be _statistically unbiased_, a term meaning that the
 process is required to give equal consideration to all possible
 outcomes.
 
-### Introducing variance
+#### Introducing variance
 
 The use of unbiased sampling outlined above makes it possible to
 estimate the population total for arbitrary subsets of the sample, as
@@ -189,12 +189,30 @@ expected value of the count or sum, the data loses information about
 variance.  This may also lead to rounding errors, when adjusted counts
 are not integer valued.
 
-### Tracing Samplers
+### Trace Sampling
 
-In tracing specifically, a number of additional concerns arise which
-deserve special treatment.  Often, the primary concern when sampling
-traces is to ensure that traces are complete, meaning that all spans
-in the graph of parent-child relationships have been collected.
+Sampling techniques are always about lowering the cost of data
+collection and analsyis, but in tracing applications specifically the
+approaches can be categorized by whether they are able to reduce
+tracer (i.e., SDK) overhead, by not recording spans for unsampled
+traces.  Lowering tracer overhead requires making the sampling
+decision for a trace before all of its attributes are known.
+
+For a trace to be useful, it is often expected to be complete, meaning
+that a tree of spans branching from a certain root are all expected to
+be collected.  When sampling is applied to lower tracer overhead
+sampling, the expectation of collecting complete traces what can be
+done.  Sampling techniques that meet these constraints are
+collectively known as _Head trace sampling_.
+
+The decision to produce a span or a trace has to be made when the root
+span starts to avoid incomplete traces.  We can approximately count
+finished spans and traces, however, without knowing how the head trace
+sampling decision was made.  Sampled spans represent their adjusted
+count number of identical spans and traces, independent of whether the
+traces were complete, as long as they are selected in an unbiased way.
+
+Several head sampling techniques are outlined next.
 
 #### `Parent` Sampler
 
@@ -308,7 +326,7 @@ true, begin sampling a sub-rooted trace with adjusted count `1/I`.
 According to current statements, this Sampler is no longer used at
 Google.
 
-### Non-Tracing applications
+### Tail sampling
 
 TODO: Counting spans is critical. Want this done before trace assembly.
 

From 4730b20007a9221f385153fc8f85f7cdd82784ca Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 2 Jun 2021 00:17:17 -0700
Subject: [PATCH 24/68] add requirements for counting spans

---
 text/0148-sampling-probability.md | 92 +++++++++++++++++++++----------
 1 file changed, 64 insertions(+), 28 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 95ee20125..d5e759a0f 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -192,27 +192,51 @@ are not integer valued.
 ### Trace Sampling
 
 Sampling techniques are always about lowering the cost of data
-collection and analsyis, but in tracing applications specifically the
-approaches can be categorized by whether they are able to reduce
-tracer (i.e., SDK) overhead, by not recording spans for unsampled
-traces.  Lowering tracer overhead requires making the sampling
+collection and analsyis, but in trace collection and analsysis
+specifically the approaches can be categorized by whether they are
+able to reduce Tracer overhead by not recording spans for unsampled
+traces.  Lowering Tracer overhead requires making the sampling
 decision for a trace before all of its attributes are known.
 
-For a trace to be useful, it is often expected to be complete, meaning
-that a tree of spans branching from a certain root are all expected to
-be collected.  When sampling is applied to lower tracer overhead
-sampling, the expectation of collecting complete traces what can be
-done.  Sampling techniques that meet these constraints are
-collectively known as _Head trace sampling_.
-
-The decision to produce a span or a trace has to be made when the root
-span starts to avoid incomplete traces.  We can approximately count
-finished spans and traces, however, without knowing how the head trace
-sampling decision was made.  Sampled spans represent their adjusted
-count number of identical spans and traces, independent of whether the
-traces were complete, as long as they are selected in an unbiased way.
-
-Several head sampling techniques are outlined next.
+Traces are expected to be complete, meaning that a tree or sub-tree of
+spans branching from a certain root are expected to be fully
+collected.  When sampling is applied to lower Tracer overhead, there
+is generally an expectation that complete traces will be produced.
+Sampling techniques that lower Tracer overhead and produce complete
+traces are known as _Head trace sampling_ techniques.
+
+The decision to produce and collect a sample trace has to be made when
+the root span starts, to avoid incomplete traces.  Using the sampling
+techniques outlined above, we can approximately count finished spans
+and traces, even without knowing how the head trace sampling decision
+was made.
+
+#### Counting spans and traces
+
+Trace collection systems can estimate the total count of spans in the
+population using a sample of spans, whether or not traces are
+assembled, simply by encoding the adjusted count (or inclusion
+probability) in every sampled span.
+
+When counting sample spans, every span stands for a trace rooted at
+itself, and so when we approximately count spans we are also
+approximately counting traces rooted in those spans.  Sampled spans
+represent an adjusted count of identical spans in the population,
+regardless of whether complete traces are being collected
+for every span.
+
+Stated as a requirement: When sampling, tracing systems must be able
+to count spans without assembling traces first.  Several head sampling
+techniques are discussed in the following sections that meet all the
+criteria:
+
+- Reduces Tracer overhead
+- Produces complete traces
+- Supports counting spans.
+
+When using a Head sampling technique that meets these criteria,
+tracing collection systems are able to then sample from the set of
+complete traces in order to further reduce collection costs.
 
 #### `Parent` Sampler
 
@@ -224,7 +248,14 @@ root tracing decision.
 
 The `Parent` Sampler ensures complete traces, provided all spans are
 successfully recorded.  A downside of `Parent` sampling is that it
-takes away control of Tracer overhead from non-roots in the trace.
+takes away control of Tracer overhead from non-roots in the trace and,
+to support counting, requires propagating the inclusion probability in
+the W3C `tracestate` field (require a semantic convention).
+
+To count Parent-sampled spans, each span must directly encode its
+adjusted count (or inclusion probability) in the corresponding
+`SpanData`.  This may use a non-descriptive Resource or Span
+attribute named `sampling.parent.adjusted_count`, for example.
 
 #### `TraceIDRatio` Sampler
 
@@ -234,7 +265,7 @@ decision based on the TraceID.  This Sampler was not finished before
 the OpenTelemetry version 1.0 specification was released; it was left
 in place, with [a TODO and the recommendation to use it only for trace
 roots](https://github.com/open-telemetry/opentelemetry-specification/issues/1413).
-[OTEP 135 proposed a solution]()
+[OTEP 135 proposed a solution](https://github.com/open-telemetry/oteps/pull/135).
 
 The goal of the `TraceIDRatio` Sampler is to coordinate the tracing
 decision, but give each service control over Tracer overhead.  Each
@@ -253,6 +284,11 @@ span](https://github.com/open-telemetry/opentelemetry-specification/issues/355).
 Lacking the number of expected children, we require a way to know the
 minimum Sampler probability across traces to ensure they are complete.
 
+To count TraceIDRatio-sampled spans, each span must encode its
+adjusted count (or inclusion probability) in the corresponding
+`SpanData`.  This may use a non-descriptive Resource or Span attribute
+named `sampling.traceidratio.adjusted_count`, for example.
+
 #### Dapper's "Inflationary" Sampler
 
 Google's [Dapper](https://research.google/pubs/pub36356/) tracing
@@ -270,9 +306,10 @@ low-throughput service.  Low-throughput services are meant to inflate
 their sampling probability.
 
 The use of this technique requires propagating the inclusion
-probability of the incoming Context and whether it was sampled, in
-order to calculate the probability of starting to sample a new
-"sub-root" in the trace.
+probability of the incoming Context and whether it was sampled (using
+the W3C `tracestate`, as for counting spans sampled by a Parent
+sampler), in order to calculate the probability of starting to sample
+a new "sub-root" in the trace.
 
 Using standard notation for conditional probability, `P(x)` indicates
 the probability of `x` being true, and `P(x|y)` indicates the
@@ -322,11 +359,10 @@ Now the Sampler makes a decision with probability `D`.  Whether the
 decision is true or false, propagate the inflationary probability `I`
 as the new parent context sampling probability.  If the decision is
 true, begin sampling a sub-rooted trace with adjusted count `1/I`.
+This may use a non-descriptive Resource or Span attribute named
+`sampling.inflationary.adjusted_count`, for example.
 
-According to current statements, this Sampler is no longer used at
-Google.
-
-### Tail sampling
+### Event sampling
 
 TODO: Counting spans is critical. Want this done before trace assembly.
 

From 760eed1845502b84c66b3021b929a1712da844d0 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 2 Jun 2021 00:20:55 -0700
Subject: [PATCH 25/68] remove TODOs

---
 text/0148-sampling-probability.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index d5e759a0f..5e9cc75d7 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -362,11 +362,13 @@ true, begin sampling a sub-rooted trace with adjusted count `1/I`.
 This may use a non-descriptive Resource or Span attribute named
 `sampling.inflationary.adjusted_count`, for example.
 
-### Event sampling
+### Telemetry event sampling
 
-TODO: Counting spans is critical. Want this done before trace assembly.
+Head sampling for traces has been discussed above, now we turn our
+attention to sampling arbitrary telemetry events, including metric and
+log events, finished spans, and complete traces.
 
-TODO: Sampling for just spans or metrics, logs.
+TODO:
 
 #### StatsD metric events
 

From eb510693090b045aa3a6e474908569c371d716ff Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 2 Jun 2021 00:24:17 -0700
Subject: [PATCH 26/68] from feedback

---
 text/0148-sampling-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 5e9cc75d7..14ba64f42 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -12,7 +12,7 @@ substantial visibility into the whole population of events.
 These techniques are all forms of approximate counting.  Estimates
 calculated by the forms of sampling outlined here are considered
 accurate, in that they are random variables with an expected value
-equal to the true count.  With enough observations, 
+equal to the true count.
 
 While sampling techniques vary, it is possible to specify high-level
 interoperability requirements that producers and consumers of sampled

From 290dac9c3ac02620f663b5f04b79328c1144b0a4 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 3 Jun 2021 00:06:31 -0700
Subject: [PATCH 27/68] introduce next section

---
 text/0148-sampling-probability.md | 66 ++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 14ba64f42..934c969ec 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -193,17 +193,17 @@ are not integer valued.
 
 Sampling techniques are always about lowering the cost of data
 collection and analsyis, but in trace collection and analsysis
-specifically the approaches can be categorized by whether they are
-able to reduce Tracer overhead by not recording spans for unsampled
-traces.  Lowering Tracer overhead requires making the sampling
-decision for a trace before all of its attributes are known.
+specifically, approaches can be categorized by whether they reduce
+Tracer overhead.  Tracer overhead is reduced by not recording spans
+for unsampled traces and requires making the sampling decision for a
+trace before all of its attributes are known.
 
 Traces are expected to be complete, meaning that a tree or sub-tree of
 spans branching from a certain root are expected to be fully
-collected.  When sampling is applied to lower Tracer overhead, there
-is generally an expectation that complete traces will be produced.
-Sampling techniques that lower Tracer overhead and produce complete
-traces are known as _Head trace sampling_ techniques.
+collected.  When sampling is applied to reduce Tracer overhead, there
+is generally an expectation that complete traces will still be
+produced.  Sampling techniques that lower Tracer overhead and produce
+complete traces are known as _Head trace sampling_ techniques.
 
 The decision to produce and collect a sample trace has to be made when
 the root span starts, to avoid incomplete traces.  Using the sampling
@@ -364,13 +364,33 @@ This may use a non-descriptive Resource or Span attribute named
 
 ### Telemetry event sampling
 
-Head sampling for traces has been discussed above, now we turn our
-attention to sampling arbitrary telemetry events, including metric and
-log events, finished spans, and complete traces.
+Head sampling for traces has been discussed, covering strategies to
+lower Tracer overhead and ensure trace completeness.  Following head
+sampling, spans become "finished" telemetry events having adjusted
+counts greater than 1.
 
-TODO:
+In general, synchronous telemetry API events (e.g., generated by
+Metrics Counter, UpDownCounter, and Histogram instruments) can be
+sampled using any of the techniques outlined above, unconstrained by
+Head sampling requirements.  Sampling outputs a set of telemetry
+events having adjusted counts greater than 1, as with Head sampling.
 
-#### StatsD metric events
+In the context of tracing, sampling of finished spans may be referred
+to as Tail sampling, because it is generally applied after initial
+Head sampling, but the techniques can be applied to any stream of
+telemetry events.  In every case, all that is required for consumers
+of these events to approximately count them is to record the adjusted
+count (or inclusion probability) with each event.
+
+#### Simple sampling
+
+TODO
+
+#### Weighted sampling
+
+TODO
+
+#### Example: Statsd 
 
 A Statsd counter event appears as a line of text, describing a
 number-valued event with optional attributes and sample rate.
@@ -393,27 +413,25 @@ Probability 0.1 leads to an adjusted count of 10.  Assuming the sample
 was selected using an unbiased algorithm, we can interpret this event
 as having an expected value of `100/0.1 = 1000`.
 
-#### Exemplars with adjusted counts
+#### Example: Two-pass sampling
 
 TODO
 
-#### Cardinality reduction for metric events
+#### Example: Combining samples
 
 TODO
 
-## Prior art and alternatives
+#### Example: Downsampling
 
-The term "adjusted count" is proposed because the resulting value is
-effectively a count and may be used in place of the exact count.
+TODO
 
-The term "adjusted weight" is NOT proposed to describe the adjustment
-made by sampling, because the adjustment being made is that of a count.
+#### Example: Multiple samples
 
-Another term for the proposed "adjusted count" concept is
-`inverse_probability`.
+TODO
 
-"Subset sum estimation" is the name given to this topic within the
-study of computer science and engineering.
+## Propoesed specification changes
+
+TODO
 
 ## Recommended reading
 

From 871aab16ffbce1fe9f7300b0dfccf9b21b63b6af Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 3 Jun 2021 00:33:11 -0700
Subject: [PATCH 28/68] simple

---
 text/0148-sampling-probability.md | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 934c969ec..c914f5bf4 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -384,7 +384,24 @@ count (or inclusion probability) with each event.
 
 #### Simple sampling
 
-TODO
+Simple sampling means including an event in the sample with inclusion
+probability known in advance.  The probability can be fixed or vary
+based on properties of the event such as a span and metric name or
+attribute values.  The adjusted count of each telemetry event is the
+reciprocal of the inclusion probability used.
+
+Users may wish to configure different sampling probabilities for
+events of varying importance.  For example, an events with a
+boolean-valued `error` attribute can be sampled at 100% when
+`error=true` and at 1% when `error=false`.  In this case, `error=true`
+events have adjusted count equal to 1 and `error=false` events have
+adjusted count equal to 100.
+
+Simple sampling can be applied repeatedly, simply by multiplying the
+new adjusted count or inclusion probability into the existinhg
+adjusted count or inclusion probability.  An event sampled with
+probability 0.5 that is sampled again with probability 0.5 has an
+inclusion probability of 0.25 and an adjusted count of 4.
 
 #### Weighted sampling
 

From ff485744956f1bd37fba18cbfcd17afd0b72603f Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Sun, 6 Jun 2021 01:30:01 -0700
Subject: [PATCH 29/68] New section outline (take 2)

---
 text/0148-sampling-probability.md | 92 +++++++++++++++++--------------
 1 file changed, 52 insertions(+), 40 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index c914f5bf4..8bb3846e9 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -1,6 +1,37 @@
 # Probability sampling of telemetry events
 
-Specify a foundation for sampling techniques in OpenTelemetry.
+<!-- toc -->
+
+- [Motivation](#motivation)
+- [Explanation](#explanation)
+  * [Model and terminology](#model-and-terminology)
+    + [Sampling without replacement](#sampling-without-replacement)
+    + [Adjusted sample count](#adjusted-sample-count)
+    + [Introducing variance](#introducing-variance)
+  * [Conveying the sampling probability](#conveying-the-sampling-probability)
+    + [Encoding adjusted count](#encoding-adjusted-count)
+    + [Encoding inclusion probability](#encoding-inclusion-probability)
+    + [Encoding negative base-2 logarithm of inclusion probability](#encoding-negative-base-2-logarithm-of-inclusion-probability)
+    + [Multiply the adjusted count into the data](#multiply-the-adjusted-count-into-the-data)
+  * [Trace Sampling](#trace-sampling)
+    + [Counting spans and traces](#counting-spans-and-traces)
+    + [`Parent` Sampler](#parent-sampler)
+    + [`TraceIDRatio` Sampler](#traceidratio-sampler)
+    + [Dapper's "Inflationary" Sampler](#dappers-inflationary-sampler)
+  * [Event sampling](#event-sampling)
+    + [Weighted sampling](#weighted-sampling)
+    + [Example: Statsd](#example-statsd)
+    + [Example: Two-pass sampling](#example-two-pass-sampling)
+    + [Example: Combining samples](#example-combining-samples)
+    + [Example: Downsampling](#example-downsampling)
+    + [Example: Multiple samples](#example-multiple-samples)
+- [Propoesed specification changes](#propoesed-specification-changes)
+- [Recommended reading](#recommended-reading)
+- [Acknowledgements](#acknowledgements)
+
+<!-- tocstop -->
+
+Objective: Specify a foundation for sampling techniques in OpenTelemetry.
 
 ## Motivation
 
@@ -248,7 +279,7 @@ root tracing decision.
 
 The `Parent` Sampler ensures complete traces, provided all spans are
 successfully recorded.  A downside of `Parent` sampling is that it
-takes away control of Tracer overhead from non-roots in the trace and,
+takes away control over Tracer overhead from non-roots in the trace and,
 to support counting, requires propagating the inclusion probability in
 the W3C `tracestate` field (require a semantic convention).
 
@@ -362,46 +393,27 @@ true, begin sampling a sub-rooted trace with adjusted count `1/I`.
 This may use a non-descriptive Resource or Span attribute named
 `sampling.inflationary.adjusted_count`, for example.
 
-### Telemetry event sampling
+### Event sampling
 
 Head sampling for traces has been discussed, covering strategies to
-lower Tracer overhead and ensure trace completeness.  Following head
-sampling, spans become "finished" telemetry events having adjusted
-counts greater than 1.
-
-In general, synchronous telemetry API events (e.g., generated by
-Metrics Counter, UpDownCounter, and Histogram instruments) can be
-sampled using any of the techniques outlined above, unconstrained by
-Head sampling requirements.  Sampling outputs a set of telemetry
-events having adjusted counts greater than 1, as with Head sampling.
-
-In the context of tracing, sampling of finished spans may be referred
-to as Tail sampling, because it is generally applied after initial
-Head sampling, but the techniques can be applied to any stream of
-telemetry events.  In every case, all that is required for consumers
-of these events to approximately count them is to record the adjusted
-count (or inclusion probability) with each event.
-
-#### Simple sampling
-
-Simple sampling means including an event in the sample with inclusion
-probability known in advance.  The probability can be fixed or vary
-based on properties of the event such as a span and metric name or
-attribute values.  The adjusted count of each telemetry event is the
-reciprocal of the inclusion probability used.
-
-Users may wish to configure different sampling probabilities for
-events of varying importance.  For example, an events with a
-boolean-valued `error` attribute can be sampled at 100% when
-`error=true` and at 1% when `error=false`.  In this case, `error=true`
-events have adjusted count equal to 1 and `error=false` events have
-adjusted count equal to 100.
-
-Simple sampling can be applied repeatedly, simply by multiplying the
-new adjusted count or inclusion probability into the existinhg
-adjusted count or inclusion probability.  An event sampled with
-probability 0.5 that is sampled again with probability 0.5 has an
-inclusion probability of 0.25 and an adjusted count of 4.
+lower Tracer overhead and ensure trace completeness.  Sampled spans
+retain their existing form, with an added attribute to carry the
+adjusted count.  After spans are finished and their attributes known,
+spans can be sampled again ("re-sampled") using a broad range of
+unequal probability sampling schemes.
+
+Known as Tail sampling when applied to traces, using sampled data as
+the basis for further sampling is generally known as multi-stage
+sampling.  We have learned how to Head sample individual span events,
+and the output are finished spans representing more than one event in
+the population (i.e., `adjusted_count > 0`).  To maintain our ability
+to approximately count spans, or telemetry events in general, requires
+maintaining a property known _probability proportional to size_.
+
+Second-stage sampling algorithms that maintain this property and do
+not introduce bias are generally known as _weighted sampling
+algorithms_.  These algorithms give us a way to combine samples so
+that adjusted count is preserved.
 
 #### Weighted sampling
 

From ffbdd81282c358a0aa1004b4c4379741722eedfb Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 14 Jun 2021 12:32:58 -0700
Subject: [PATCH 30/68] Minor edits in top-matter

---
 text/0148-sampling-probability.md | 34 +++++++++++++++++--------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 8bb3846e9..1a2dbdf4a 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -42,8 +42,8 @@ substantial visibility into the whole population of events.
 
 These techniques are all forms of approximate counting.  Estimates
 calculated by the forms of sampling outlined here are considered
-accurate, in that they are random variables with an expected value
-equal to the true count.
+accurate, in the sense that they are random variables with an expected
+value equal to their true value.
 
 While sampling techniques vary, it is possible to specify high-level
 interoperability requirements that producers and consumers of sampled
@@ -57,33 +57,36 @@ Using the OpenTelemetry Metrics data model terminology, we have two
 scenarios in which sampling is common.
 
 1. _Counter events:_ Each event represents a count, signifying the change in a sum.
-2. _Histogram events:_ Each event represents an individual variable, signifying new membership in a distribution.
+2. _Histogram events:_ Each event represents an individual variable, signifying membership in a distribution.
 
 A Tracing Span event qualifies as both of these cases simultaneously.
-It is a Counter event (of 1 span) and at least one Histogram event
-(e.g., one of latency, one of request size).
+It is at least one Counter event (e.g., one request, the number of
+bytes read) and at least one Histogram event (e.g., request latency,
+request size).
 
 In Metrics, [Statsd Counter and Histogram events meet this definition](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#sampling).
 
-In both cases, the goal in sampling is to estimate something about the
-population, meaning all the events, using only the events that were
-selected in the sample.
+In both cases, the goal in sampling is to estimate the count of events
+in the whole population, meaning all the events, using only the events
+that were selected in the sample.
 
 ### Model and terminology
 
 This model is meant to apply in telemetry collection situations where
-individual events at an API boundary are sampled for collection.
+individual events at an API boundary are sampled for collection.  Once
+the process of sampling individual API-level events is understood, we
+will learn to apply these techniques for sampling aggregated data.
 
 In sampling, the term _sampling design_ refers to how sampling
 probability is decided and the term _sample frame_ refers to how
-events are organized into discrete populations.  
+events are organized into discrete populations.
 
 For example, a simple design uses uniform probability, and a simple
 framing technique is to collect one sample per distinct span name.
 
 After executing a sampling design over a frame, each item selected in
 the sample will have known _inclusion probability_, that determines
-how likely the item was to be selected.  Implicitly, all the items
+how likely the item was to being selected.  Implicitly, all the items
 that were not selected for the sample have zero inclusion probability.
 
 Descriptive words that are often used to describe sampling designs:
@@ -406,14 +409,15 @@ Known as Tail sampling when applied to traces, using sampled data as
 the basis for further sampling is generally known as multi-stage
 sampling.  We have learned how to Head sample individual span events,
 and the output are finished spans representing more than one event in
-the population (i.e., `adjusted_count > 0`).  To maintain our ability
-to approximately count spans, or telemetry events in general, requires
-maintaining a property known _probability proportional to size_.
+the population (i.e., `adjusted_count > 1`).  To maintain our ability
+to approximately count spans without bias when re-sampling, or
+telemetry events in general, requires maintaining a property known
+_probability proportional to size_.
 
 Second-stage sampling algorithms that maintain this property and do
 not introduce bias are generally known as _weighted sampling
 algorithms_.  These algorithms give us a way to combine samples so
-that adjusted count is preserved.
+that the expected value of adjusted count is preserved.
 
 #### Weighted sampling
 

From 284ff2b5a76f750b230a5bff75e545caf6ea0c80 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 14 Jun 2021 12:44:24 -0700
Subject: [PATCH 31/68] Minor edits in model and terminology

---
 text/0148-sampling-probability.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 1a2dbdf4a..76a6eb7d3 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -117,7 +117,7 @@ interest, after calculating a sample from a sample frame.
 
 - *Sample size*: the number of events with non-zero inclusion probability
 - *True population total*: the exact number of events in the frame, which may be unknown
-- *Estimated population total*: the estimated number of events in the frame, which is computed from the same.
+- *Estimated population total*: the estimated number of events in the frame, which is computed from the sample.
 
 The sample size is always known after it is calculated, but the size
 may or may not be known ahead of time, depending on the design.
@@ -215,8 +215,7 @@ representivity.
 #### Multiply the adjusted count into the data
 
 When the data itself carries counts, such as for the Metrics Sum and
-Histogram points (encoded using delta aggregation temporality), we can
-fold the adjusted count into the data itself.
+Histogram points.
 
 This technique is less desirable because while it preserves the
 expected value of the count or sum, the data loses information about

From acbee1c689f3f5ebd5f61a4d0360b8e545021ba0 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 14 Jun 2021 14:44:29 -0700
Subject: [PATCH 32/68] Counting spans edits

---
 text/0148-sampling-probability.md | 65 +++++++++++++------------------
 1 file changed, 28 insertions(+), 37 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 76a6eb7d3..95c4b1fc5 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -246,49 +246,41 @@ was made.
 
 #### Counting spans and traces
 
-Trace collection systems can estimate the total count of spans in the
-population using a sample of spans, whether or not traces are
-assembled, simply by encoding the adjusted count (or inclusion
-probability) in every sampled span.
-
-When counting sample spans, every span stands for a trace rooted at
-itself, and so when we approximately count spans we are also
-approximately counting traces rooted in those spans.  Sampled spans
-represent an adjusted count of identical spans in the population,
-regardless of whether complete traces are being collected
-for every span.
-
-Stated as a requirement: When sampling, tracing systems must be able
-to count spans without assembling traces first.  Several head sampling
-techniques are discussed in the following sections that meet all the
-criteria:
+When the [W3C Trace Context is-sampled
+flag](https://www.w3.org/TR/trace-context/#sampled-flag) is used to
+propagate a sampling decision, child spans have the same adjusted
+count as their parent.  This leads to a useful optimization.
+
+It is nice-to-have, though not a requirement, that all spans in a
+trace directly encode their adjusted count.  This enables systems to
+count spans upon arrival, without the work of referring to their
+parent spans.  For example, knowing a span's adjusted count makes it
+possible to immediately produce metric events from span events.
+
+Several head sampling techniques are discussed in the following
+sections and evaluated in terms of their ability to meet all of the
+following criteria:
 
 - Reduces Tracer overhead
 - Produces complete traces
-- Supports counting spans.
-
-When using a Head sampling technique that meets these criteria,
-tracing collection systems are able to then sample from the set of
-complete traces in order to further reduce collection costs.
+- Spans are countable.
 
 #### `Parent` Sampler
 
-It is possible for a decision at the root of a trace to propagate
-throughout the trace using the [W3C Trace Context is-sampled
-flag](https://www.w3.org/TR/trace-context/#sampled-flag).  The
-inclusion probability of all spans in a trace is determined by the
-root tracing decision.
-
 The `Parent` Sampler ensures complete traces, provided all spans are
 successfully recorded.  A downside of `Parent` sampling is that it
-takes away control over Tracer overhead from non-roots in the trace and,
-to support counting, requires propagating the inclusion probability in
-the W3C `tracestate` field (require a semantic convention).
+takes away control over Tracer overhead from non-roots in the trace.
+To support counting spans, this Sampler requires propagating the
+effective adjusted count of the context to use when starting child
+spans.
+
+(To propagate the effective adjusted count in the W3C trace context,
+potentially a new field could be added to the `traceparent`.)
 
 To count Parent-sampled spans, each span must directly encode its
-adjusted count (or inclusion probability) in the corresponding
-`SpanData`.  This may use a non-descriptive Resource or Span
-attribute named `sampling.parent.adjusted_count`, for example.
+adjusted count in the corresponding `SpanData`.  This may use a
+non-descriptive Resource or Span attribute named
+`sampling.parent.adjusted_count`, for example.
 
 #### `TraceIDRatio` Sampler
 
@@ -339,10 +331,9 @@ low-throughput service.  Low-throughput services are meant to inflate
 their sampling probability.
 
 The use of this technique requires propagating the inclusion
-probability of the incoming Context and whether it was sampled (using
-the W3C `tracestate`, as for counting spans sampled by a Parent
-sampler), in order to calculate the probability of starting to sample
-a new "sub-root" in the trace.
+probability of the incoming Context and whether it was sampled (as for
+the Parent sampler), in order to calculate the probability of starting
+to sample a new "sub-root" in the trace.
 
 Using standard notation for conditional probability, `P(x)` indicates
 the probability of `x` being true, and `P(x|y)` indicates the

From c7e6f9ed228a3a9aa70f626a13e2027638167e4a Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 14 Jun 2021 17:11:01 -0700
Subject: [PATCH 33/68] intro weighted

---
 text/0148-sampling-probability.md | 93 ++++++++++++++++---------------
 1 file changed, 47 insertions(+), 46 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 95c4b1fc5..4996bdb87 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -274,13 +274,19 @@ To support counting spans, this Sampler requires propagating the
 effective adjusted count of the context to use when starting child
 spans.
 
-(To propagate the effective adjusted count in the W3C trace context,
-potentially a new field could be added to the `traceparent`.)
-
-To count Parent-sampled spans, each span must directly encode its
-adjusted count in the corresponding `SpanData`.  This may use a
-non-descriptive Resource or Span attribute named
-`sampling.parent.adjusted_count`, for example.
+In other head trace sampling schemes, we will see that it is useful to
+propagate inclusion probability even for negative sampling decisions
+(where the adjusted count is zero), therefore we prefer to use the
+inclusion probability and not the adjusted count when propagating the
+sampling rate via trace context.  The inclusion probability of a
+context is referred to as its `head inclusion probability` for this
+reason.
+
+In addition to propagating head inclusion probability, to count
+Parent-sampled spans, each span must directly encode its adjusted
+count in the corresponding `SpanData`.  This may use a non-descriptive
+Resource or Span attribute named `sampling.parent.adjusted_count`, for
+example.
 
 #### `TraceIDRatio` Sampler
 
@@ -310,9 +316,9 @@ Lacking the number of expected children, we require a way to know the
 minimum Sampler probability across traces to ensure they are complete.
 
 To count TraceIDRatio-sampled spans, each span must encode its
-adjusted count (or inclusion probability) in the corresponding
-`SpanData`.  This may use a non-descriptive Resource or Span attribute
-named `sampling.traceidratio.adjusted_count`, for example.
+adjusted count in the corresponding `SpanData`.  This may use a
+non-descriptive Resource or Span attribute named
+`sampling.traceidratio.adjusted_count`, for example.
 
 #### Dapper's "Inflationary" Sampler
 
@@ -330,10 +336,10 @@ systems where a high-throughput service on occasion calls a
 low-throughput service.  Low-throughput services are meant to inflate
 their sampling probability.
 
-The use of this technique requires propagating the inclusion
-probability of the incoming Context and whether it was sampled (as for
-the Parent sampler), in order to calculate the probability of starting
-to sample a new "sub-root" in the trace.
+The use of this technique requires propagating the head inclusion
+probability (as discussed for the `Parent` sampler) of the incoming
+Context and whether it was sampled, in order to calculate the
+probability of starting to sample a new "sub-root" in the trace.
 
 Using standard notation for conditional probability, `P(x)` indicates
 the probability of `x` being true, and `P(x|y)` indicates the
@@ -346,23 +352,24 @@ P(x)=P(x|y)*P(y)+P(x|not y)*P(not y)
 
 The variables are:
 
-- **`C`**: The sampling probability of the parent context that is in
-  effect, independent of whether the parent context was sampled.
+- **`H`**: The head inclusion probability of the parent context that
+  is in effect, independent of whether the parent context was sampled,
+  the reciprocal of the parent context's effective adjusted count.
 - **`I`**: The inflationary sampling probability for the span being
   started.
 - **`D`**: The decision probability for whether to start a new sub-root.
 
 This Sampler cannot lower sampling probability, so if the new span is
-started with `C >= I` or when the context is already sampled, no new
+started with `H >= I` or when the context is already sampled, no new
 sampling decisions are made.  If the incoming context is already
-sampled, the adjusted count of the new span is `1/C`.
+sampled, the adjusted count of the new span is `1/H`.
 
-Assuming `C < I` and the incoming context was not sampled, we have the
+Assuming `H < I` and the incoming context was not sampled, we have the
 following probability equations:
 
 ```
 P(span sampled) = I
-P(parent sampled) = C
+P(parent sampled) = H
 P(span sampled | parent sampled) = 1
 P(span sampled | parent not sampled) = D
 ```
@@ -370,44 +377,38 @@ P(span sampled | parent not sampled) = D
 Using the formula above, 
 
 ```
-I = 1*C + D*(1-C)
+I = 1*H + D*(1-H)
 ```
 
 solve for D:
 
 ```
-D = (I - C) / (1 - C)
+D = (I - H) / (1 - H)
 ```
 
 Now the Sampler makes a decision with probability `D`.  Whether the
-decision is true or false, propagate the inflationary probability `I`
-as the new parent context sampling probability.  If the decision is
-true, begin sampling a sub-rooted trace with adjusted count `1/I`.
-This may use a non-descriptive Resource or Span attribute named
+decision is true or false, propagate `I` as the new head inclusion
+probability.  If the decision is true, begin recording a sub-rooted
+trace with adjusted count `1/I`.  This may use a non-descriptive
+Resource or Span attribute named
 `sampling.inflationary.adjusted_count`, for example.
 
-### Event sampling
+### Working with adjusted counts
 
 Head sampling for traces has been discussed, covering strategies to
-lower Tracer overhead and ensure trace completeness.  Sampled spans
-retain their existing form, with an added attribute to carry the
-adjusted count.  After spans are finished and their attributes known,
-spans can be sampled again ("re-sampled") using a broad range of
-unequal probability sampling schemes.
-
-Known as Tail sampling when applied to traces, using sampled data as
-the basis for further sampling is generally known as multi-stage
-sampling.  We have learned how to Head sample individual span events,
-and the output are finished spans representing more than one event in
-the population (i.e., `adjusted_count > 1`).  To maintain our ability
-to approximately count spans without bias when re-sampling, or
-telemetry events in general, requires maintaining a property known
-_probability proportional to size_.
-
-Second-stage sampling algorithms that maintain this property and do
-not introduce bias are generally known as _weighted sampling
-algorithms_.  These algorithms give us a way to combine samples so
-that the expected value of adjusted count is preserved.
+lower Tracer overhead, ensure trace completeness, and count spans on
+arrival.  Sampled spans have an added attribute to directly encode the
+adjusted count, and the sum of adjusted counts for a set of spans
+accurately reflects the total population count.
+
+In systems based on collecting sample data, it is often useful to
+combine samples to maintain a small data set.  For example, given 24
+one-hour samples of 1000 spans each, can we combine the data into a
+one-day sample of 1000 spans?  To do this without introducing bias, we
+must take the adjusted count of each span into account.  Sampling
+algorithms that can do this are known as weighted sampling algorithms.
+
+TODO
 
 #### Weighted sampling
 

From 24be1dc218537c5dd13e7dab20f3752764fd6002 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 15 Jun 2021 00:28:29 -0700
Subject: [PATCH 34/68] about merging

---
 text/0148-sampling-probability.md | 88 ++++++++++++++++++++++---------
 1 file changed, 62 insertions(+), 26 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 4996bdb87..3a508cae1 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -397,27 +397,68 @@ Resource or Span attribute named
 
 Head sampling for traces has been discussed, covering strategies to
 lower Tracer overhead, ensure trace completeness, and count spans on
-arrival.  Sampled spans have an added attribute to directly encode the
-adjusted count, and the sum of adjusted counts for a set of spans
-accurately reflects the total population count.
+arrival.  Sampled spans have an additional attribute to directly
+encode the adjusted count, and the sum of adjusted counts for a set of
+spans accurately reflects the total population count.
 
 In systems based on collecting sample data, it is often useful to
-combine samples to maintain a small data set.  For example, given 24
-one-hour samples of 1000 spans each, can we combine the data into a
-one-day sample of 1000 spans?  To do this without introducing bias, we
-must take the adjusted count of each span into account.  Sampling
-algorithms that can do this are known as weighted sampling algorithms.
+merge samples, in order to maintain a small data set.  For example,
+given 24 one-hour samples of 1000 spans each, can we combine the data
+into a one-day sample of 1000 spans?  To do this without introducing
+bias, we must take the adjusted count of each span into account.
+Sampling algorithms that do this are known as _weighted sampling
+algorithms_.
+
+#### Merging samples
+
+To merge samples means to combine two or more frames of sample data
+into a single frame of sample data that reflects the combined
+populations of data in an unbiased way.  Two weighted sampling
+algorithms are listed below in [Recommended Reading](#recommended-reading).
+
+In a broader context, weighted sampling algorithms support estimating
+population weight given individual item weights.  In a telemetry
+context, weights are counts, and weighted sampling algorithms support
+estimating total population count given inputs with unequal adjusted
+count.
 
-TODO
+The output of weighted sampling, in a telemetry context, are samples
+containing events with new adjusted counts that maintain their power
+to estimate counts in the combined population.
 
-#### Weighted sampling
+### Examples
 
-TODO
+#### Sample Spans to Counter Metric
+
+For every span it receives, the example processor will synthesize
+metric data as though a Counter instrument named `S.count` for span
+named `S` had been incremented once per span at the original
+`Start()` call site.
+
+Logically spaking, this processor will add the adjusted count of each
+span to the instrument (e.g., `Add(adjusted_count, labels...)`)  for
+every span it receives, at the start time of the span.
+
+#### Sample Spans to Histogram Metric
+
+For every span it receives, the example processor will synthesize
+metric data as though a Histogram instrument named `S.duration` for
+span named `S` had been observed once per span at the original `End()`
+call site.
+
+The OpenTelemetry Metric data model does not support histogram buckets
+with non-integer counts, which forces the use of integer adjusted
+counts (i.e., integer-reciprocal sampling rates) here.
 
-#### Example: Statsd 
+Logically spaking, this processor will observe the span's duration its
+adjusted count number of times for every span it receives, at the end
+time of the span.
+
+#### Statsd Counter
 
 A Statsd counter event appears as a line of text, describing a
-number-valued event with optional attributes and sample rate.
+number-valued event with optional attributes and inclusion probability
+("sample rate").
 
 For example, a metric named `name` is incremented by `increment` using
 a counter event (`c`) with the given `sample_rate`.
@@ -433,19 +474,12 @@ random sampling scheme will arrive as:
 counter:100|c|@0.1
 ```
 
-Probability 0.1 leads to an adjusted count of 10.  Assuming the sample
-was selected using an unbiased algorithm, we can interpret this event
-as having an expected value of `100/0.1 = 1000`.
-
-#### Example: Two-pass sampling
-
-TODO
-
-#### Example: Combining samples
-
-TODO
+Events in the example have with 0.1 inclusion probability have
+adjusted count of 10.  Assuming the sample was selected using an
+unbiased algorithm, we can interpret this event as having an expected
+count of `100/0.1 = 1000`.
 
-#### Example: Downsampling
+#### Example: Sample span rate limiter
 
 TODO
 
@@ -453,7 +487,7 @@ TODO
 
 TODO
 
-## Propoesed specification changes
+## Proposed specification changes
 
 TODO
 
@@ -467,6 +501,8 @@ TODO
 
 [A Generalization of Sampling Without Replacement From a Finite Universe](https://www.jstor.org/stable/2280784), JSTOR (1952)
 
+[Stream sampling for variance-optimal estimation of subset sums](https://arxiv.org/abs/0803.0473).
+
 ## Acknowledgements
 
 Thanks to [Neena Dugar](https://github.com/neena) and [Alex

From 0e472acc2dbc481088dcc61542ab47f864ec13b8 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 15 Jun 2021 00:40:46 -0700
Subject: [PATCH 35/68] examples

---
 text/0148-sampling-probability.md | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 3a508cae1..63e6273bc 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -481,11 +481,20 @@ count of `100/0.1 = 1000`.
 
 #### Example: Sample span rate limiter
 
-TODO
-
-#### Example: Multiple samples
+A collector processor will introduce a slight delay in order to ensure
+it has received a complete frame of data, during which time it
+maintains a fixed-size buffer of input spans.  If the number of spans
+received exceeds the size of the buffer before the end of the
+interval, begin weighted sampling using the adjusted count of each
+span as input weight.
+
+This processor drops spans when the configured rate threshold is
+exceeeded, otherwise it passes spans through with unmodifed adjusted
+count.
 
-TODO
+When the interval expires and the sample frame is considered complete,
+the selected sample spans are output with possibly updated adjusted
+counts.
 
 ## Proposed specification changes
 

From 541f8764ddbf747b4946532cdddfa540cb82dd81 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 15 Jun 2021 01:21:00 -0700
Subject: [PATCH 36/68] outline spec changes

---
 text/0148-sampling-probability.md | 51 +++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 63e6273bc..5a76b5359 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -496,20 +496,65 @@ When the interval expires and the sample frame is considered complete,
 the selected sample spans are output with possibly updated adjusted
 counts.
 
-## Proposed specification changes
+## Proposed Tracing specification
+
+For the standard OpenTelemetry Span Sampler implementations to support
+a range of probability sampling schemes, this document recommends the
+use of a Span attribute named `sampling.X.adjusted_count` to encode
+the adjusted count computed by a Sampler named "X", whic should be
+unbiased.
+
+The value any attribute name prefixed with "sampling." and suffixed
+with ".adjusted_count" under this proposal MUST be an unbiased
+estimate of the total population count represented by the individual
+event.
+
+The specification will state that non-probabilistic rate limiters and
+other processors that distort the interpretation of adjusted count
+outlined above SHOULD erase the adjusted count attributes to prevent
+mis-counting events.
+
+### Suggested text
+
+After
+[Sampler](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler)
+and before [Builtin
+Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#built-in-samplers),
+a new section named *Probability sampling* will introduce the term
+"adjusted count" and relate it to inclusion probability, citing
+external sources and this OTEP.  For example:
 
 TODO
 
+For the `TraceIDRatio` sampler, include the following additional text:
+
+> When returning a RECORD_AND_SAMPLE decision, the TraceIDRatio
+> Sampler MUST include the attribute
+> `sampling.traceidratio.adjusted_count=C` for `C` the reciprocal of the
+> configured trace ID ratio.
+> 
+> The returned tracestate used for the child context MUST include an
+> additional key-value carrying the head inclusion probability, equal
+> to the configured trace ID ratio. (TODO: spec the tracestate key for
+> head inclusion probability.)
+
+For the `Parent` sampler, include the following additional text:
+
+> When returning a RECORD_AND_SAMPLE decision, the Parent Sampler MUST
+> include the attribute `sampling.parent.adjusted_count=C` for `C` the
+> reciprocal of the parent trace context's head inclusion probability,
+> which is passed through W3C tracestate.
+
 ## Recommended reading
 
 [Sampling, 3rd Edition, by Steven K. Thompson](https://www.wiley.com/en-us/Sampling%2C+3rd+Edition-p-9780470402313).
 
+[A Generalization of Sampling Without Replacement From a Finite Universe](https://www.jstor.org/stable/2280784), JSTOR (1952)
+
 [Performance Is A Shape. Cost Is A Number: Sampling](https://docs.lightstep.com/otel/performance-is-a-shape-cost-is-a-number-sampling), 2020 blog post, Joshua MacDonald
 
 [Priority sampling for estimation of arbitrary subset sums](https://dl.acm.org/doi/abs/10.1145/1314690.1314696)
 
-[A Generalization of Sampling Without Replacement From a Finite Universe](https://www.jstor.org/stable/2280784), JSTOR (1952)
-
 [Stream sampling for variance-optimal estimation of subset sums](https://arxiv.org/abs/0803.0473).
 
 ## Acknowledgements

From b8da9b9d212bab58b50b621a08774da20df06482 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 15 Jun 2021 08:11:35 -0700
Subject: [PATCH 37/68] TOC update

---
 text/0148-sampling-probability.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 5a76b5359..435927f4a 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -18,14 +18,15 @@
     + [`Parent` Sampler](#parent-sampler)
     + [`TraceIDRatio` Sampler](#traceidratio-sampler)
     + [Dapper's "Inflationary" Sampler](#dappers-inflationary-sampler)
-  * [Event sampling](#event-sampling)
-    + [Weighted sampling](#weighted-sampling)
-    + [Example: Statsd](#example-statsd)
-    + [Example: Two-pass sampling](#example-two-pass-sampling)
-    + [Example: Combining samples](#example-combining-samples)
-    + [Example: Downsampling](#example-downsampling)
-    + [Example: Multiple samples](#example-multiple-samples)
-- [Propoesed specification changes](#propoesed-specification-changes)
+  * [Working with adjusted counts](#working-with-adjusted-counts)
+    + [Merging samples](#merging-samples)
+  * [Examples](#examples)
+    + [Sample Spans to Counter Metric](#sample-spans-to-counter-metric)
+    + [Sample Spans to Histogram Metric](#sample-spans-to-histogram-metric)
+    + [Statsd Counter](#statsd-counter)
+    + [Example: Sample span rate limiter](#example-sample-span-rate-limiter)
+- [Proposed Tracing specification](#proposed-tracing-specification)
+  * [Suggested text](#suggested-text)
 - [Recommended reading](#recommended-reading)
 - [Acknowledgements](#acknowledgements)
 

From 00c171491ad22f801879878fe395e6d06a366b2c Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 15 Jun 2021 16:13:52 -0700
Subject: [PATCH 38/68] more on metrics sampling

---
 text/0148-sampling-probability.md | 112 +++++++++++++++++++++++-------
 1 file changed, 86 insertions(+), 26 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 435927f4a..8ef3b3e03 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -61,9 +61,9 @@ scenarios in which sampling is common.
 2. _Histogram events:_ Each event represents an individual variable, signifying membership in a distribution.
 
 A Tracing Span event qualifies as both of these cases simultaneously.
-It is at least one Counter event (e.g., one request, the number of
-bytes read) and at least one Histogram event (e.g., request latency,
-request size).
+One span can be interpreted as at least one Counter event (e.g., one
+request, the number of bytes read) and at least one Histogram event
+(e.g., request latency, request size).
 
 In Metrics, [Statsd Counter and Histogram events meet this definition](https://github.com/statsd/statsd/blob/master/docs/metric_types.md#sampling).
 
@@ -418,23 +418,58 @@ populations of data in an unbiased way.  Two weighted sampling
 algorithms are listed below in [Recommended Reading](#recommended-reading).
 
 In a broader context, weighted sampling algorithms support estimating
-population weight given individual item weights.  In a telemetry
-context, weights are counts, and weighted sampling algorithms support
-estimating total population count given inputs with unequal adjusted
-count.
+population weights from a sample of unequal-weight items.  In a
+telemetry context, item weights are generally event counts, and
+weighted sampling algorithms support estimating total population
+counts from a sample with unequal-count items.
 
 The output of weighted sampling, in a telemetry context, are samples
-containing events with new adjusted counts that maintain their power
-to estimate counts in the combined population.
-
-### Examples
+containing events with new, unequal adjusted counts that maintain
+their power to estimate counts in the combined population.
+
+#### Maintaining "Probability proportional to size"
+
+The statistical property being maintained in the definition for
+weighted sampling used above is known as _probability propertional to
+size_.  The "size" of an item, in this context, refers to the
+magnitude of each item's contribution to the total that is being
+estimated. To avoid bias, larger-magnitude items should have a
+proportionally greater probability of being selected in a sample,
+compared with items of smaller magnitide.
+
+The interpretation of "size", therefore, depends on what is being
+estimated.  When sampling events with a natural size, such as for
+Metric Sum data points, the absolute value of the point should be
+multiplied with the adjusted count to form an effective input weight
+(i.e., its "size" or contribution to the population total).  The
+output adjusted count in this case is the output from weighted
+sampling divided by the (absolute) point value.
+
+#### Zero adjusted count
+
+An adjusted count with zero value carries meaningful information,
+specifically that the item participated in a probabilistic sampling
+scheme and was not selected.  A zero value can be be useful to record
+events when they provide useful information despite their effective
+count; we can use this to combine multiple sampling schemes in a
+single stream.
+
+For example, consider collecting two samples from a stream of spans,
+the first sample from those spans with attribute `A=1` and the second
+sample from those spans with attribute `B=2`.  Any span that has both
+of these properties is eligible to be selected for both samples, in
+which case it will have two non-zero adjusted counts.
+
+## Examples
+
+### Span sampling
 
 #### Sample Spans to Counter Metric
 
 For every span it receives, the example processor will synthesize
 metric data as though a Counter instrument named `S.count` for span
-named `S` had been incremented once per span at the original
-`Start()` call site.
+named `S` had been incremented once per span at the original `Start()`
+call site.
 
 Logically spaking, this processor will add the adjusted count of each
 span to the instrument (e.g., `Add(adjusted_count, labels...)`)  for
@@ -455,6 +490,25 @@ Logically spaking, this processor will observe the span's duration its
 adjusted count number of times for every span it receives, at the end
 time of the span.
 
+#### Sample Spans rate limiting
+
+A collector processor will introduce a slight delay in order to ensure
+it has received a complete frame of data, during which time it
+maintains a fixed-size buffer of input spans.  If the number of spans
+received exceeds the size of the buffer before the end of the
+interval, begin weighted sampling using the adjusted count of each
+span as input weight.
+
+This processor drops spans when the configured rate threshold is
+exceeeded, otherwise it passes spans through with unmodifed adjusted
+count.
+
+When the interval expires and the sample frame is considered complete,
+the selected sample spans are output with possibly updated adjusted
+counts.
+
+### Metric sampling
+
 #### Statsd Counter
 
 A Statsd counter event appears as a line of text, describing a
@@ -480,22 +534,28 @@ adjusted count of 10.  Assuming the sample was selected using an
 unbiased algorithm, we can interpret this event as having an expected
 count of `100/0.1 = 1000`.
 
-#### Example: Sample span rate limiter
+#### Metric exemplars with adjusted counts
 
-A collector processor will introduce a slight delay in order to ensure
-it has received a complete frame of data, during which time it
-maintains a fixed-size buffer of input spans.  If the number of spans
-received exceeds the size of the buffer before the end of the
-interval, begin weighted sampling using the adjusted count of each
-span as input weight.
+The OTLP protocol for metrics includes a repeated exemplars field in
+every data point.  This is a place where histogram implementations are
+able to provide example context to corrlate metrics with traces.
 
-This processor drops spans when the configured rate threshold is
-exceeeded, otherwise it passes spans through with unmodifed adjusted
-count.
+OTLP exemplars support additional attributes, those that were present
+on the API event and were dropped during aggregation.  Exemplars that
+are selected probabilistically and recorded with their adjusted counts
+make it possible to approximately count events using dimensions that
+were dropped during metric aggregation.  When sampling metric events,
+use probability proportional to size, meaning for Metric Sum data
+points include the absolute point value as a product in the input
+sample weight.
 
-When the interval expires and the sample frame is considered complete,
-the selected sample spans are output with possibly updated adjusted
-counts.
+An end-to-end pipeline of sampled metrics events can be constructed
+based on exemplars with adjusted counts, one capable of supporting
+approximate queries over high-cardinality metrics.
+
+#### Metric cardinality control
+
+TODO
 
 ## Proposed Tracing specification
 

From 3d16478836dba88f630cca2d81f020ffd26ef64b Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 05:06:20 -0700
Subject: [PATCH 39/68] cardinality limit example

---
 text/0148-sampling-probability.md | 55 +++++++++++++++++++------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 8ef3b3e03..810a841d0 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -458,24 +458,31 @@ For example, consider collecting two samples from a stream of spans,
 the first sample from those spans with attribute `A=1` and the second
 sample from those spans with attribute `B=2`.  Any span that has both
 of these properties is eligible to be selected for both samples, in
-which case it will have two non-zero adjusted counts.
+which case one event could have two non-zero adjusted counts (e.g.,
+`sampling.by_a.adjusted_count` and `sampling.by_b.adjusted_count`).
 
 ## Examples
 
+In all of these examples, the use of probability sampling leads to an
+attribute like `sampling.sampler_name.adjusted_count`.  Consumers of
+spans, metrics, and logs annotated with adjusted counts are able to
+calculate accurate statistics about the population of events, at a
+basic level, without knowing details about the sampling configuration.
+
 ### Span sampling
 
-#### Sample Spans to Counter Metric
+#### Sample spans to Counter Metric
 
 For every span it receives, the example processor will synthesize
 metric data as though a Counter instrument named `S.count` for span
 named `S` had been incremented once per span at the original `Start()`
 call site.
 
-Logically spaking, this processor will add the adjusted count of each
-span to the instrument (e.g., `Add(adjusted_count, labels...)`)  for
-every span it receives, at the start time of the span.
+This processor will add the adjusted count of each span to the
+instrument (e.g., `Add(adjusted_count, labels...)`)  for every span it
+receives, logically effective at the start or end time of the span.
 
-#### Sample Spans to Histogram Metric
+#### Sample spans to Histogram Metric
 
 For every span it receives, the example processor will synthesize
 metric data as though a Histogram instrument named `S.duration` for
@@ -484,13 +491,13 @@ call site.
 
 The OpenTelemetry Metric data model does not support histogram buckets
 with non-integer counts, which forces the use of integer adjusted
-counts (i.e., integer-reciprocal sampling rates) here.
+counts here (i.e., 1-in-N sampling rates where N is an integer).
 
 Logically spaking, this processor will observe the span's duration its
 adjusted count number of times for every span it receives, at the end
 time of the span.
 
-#### Sample Spans rate limiting
+#### Sample span rate limiting
 
 A collector processor will introduce a slight delay in order to ensure
 it has received a complete frame of data, during which time it
@@ -551,30 +558,36 @@ sample weight.
 
 An end-to-end pipeline of sampled metrics events can be constructed
 based on exemplars with adjusted counts, one capable of supporting
-approximate queries over high-cardinality metrics.
+approximate queries over sampled metric events at high cardinality.
 
-#### Metric cardinality control
+#### Metric cardinality limiter
 
-TODO
+A metrics processor can be configured to limit cardinality for a
+single metric name, allowing no more than K distinct label sets per
+export interval.  The export interval is fixed to a short interval so
+that a complete set of distinct labels can be stored temporarily.
+
+Caveats: as presented, this works for Sum and histograms received in
+Delta temporality and where the Sum is monotonic, as discussed in
+[opentelemetry-proto/issues/303](https://github.com/open-telemetry/opentelemetry-proto/issues/303).
+
+Considering data points received during the interval, when the number
+of points exceeds K, select a probability proportional to size sample
+of points, output every point with an additional (non-descriptive)
+`sampling.cardinality_limit.adjusted_count` attribute.
 
 ## Proposed Tracing specification
 
 For the standard OpenTelemetry Span Sampler implementations to support
 a range of probability sampling schemes, this document recommends the
-use of a Span attribute named `sampling.X.adjusted_count` to encode
-the adjusted count computed by a Sampler named "X", whic should be
-unbiased.
+use of a Span attribute named `sampling.sampler_name.adjusted_count`
+to encode an unbiased adjusted count computed by a Sampler.
 
 The value any attribute name prefixed with "sampling." and suffixed
 with ".adjusted_count" under this proposal MUST be an unbiased
 estimate of the total population count represented by the individual
 event.
 
-The specification will state that non-probabilistic rate limiters and
-other processors that distort the interpretation of adjusted count
-outlined above SHOULD erase the adjusted count attributes to prevent
-mis-counting events.
-
 ### Suggested text
 
 After
@@ -589,7 +602,7 @@ TODO
 
 For the `TraceIDRatio` sampler, include the following additional text:
 
-> When returning a RECORD_AND_SAMPLE decision, the TraceIDRatio
+> When returning a `RECORD_AND_SAMPLE` decision, the TraceIDRatio
 > Sampler MUST include the attribute
 > `sampling.traceidratio.adjusted_count=C` for `C` the reciprocal of the
 > configured trace ID ratio.
@@ -601,7 +614,7 @@ For the `TraceIDRatio` sampler, include the following additional text:
 
 For the `Parent` sampler, include the following additional text:
 
-> When returning a RECORD_AND_SAMPLE decision, the Parent Sampler MUST
+> When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MUST
 > include the attribute `sampling.parent.adjusted_count=C` for `C` the
 > reciprocal of the parent trace context's head inclusion probability,
 > which is passed through W3C tracestate.

From d7a24ae2943f05d782026345cd2a6f8b57a35028 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 05:10:58 -0700
Subject: [PATCH 40/68] toc edit

---
 text/0148-sampling-probability.md | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 810a841d0..ae88f0861 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -7,7 +7,7 @@
   * [Model and terminology](#model-and-terminology)
     + [Sampling without replacement](#sampling-without-replacement)
     + [Adjusted sample count](#adjusted-sample-count)
-    + [Introducing variance](#introducing-variance)
+    + [Sampling and variance](#sampling-and-variance)
   * [Conveying the sampling probability](#conveying-the-sampling-probability)
     + [Encoding adjusted count](#encoding-adjusted-count)
     + [Encoding inclusion probability](#encoding-inclusion-probability)
@@ -20,11 +20,17 @@
     + [Dapper's "Inflationary" Sampler](#dappers-inflationary-sampler)
   * [Working with adjusted counts](#working-with-adjusted-counts)
     + [Merging samples](#merging-samples)
-  * [Examples](#examples)
-    + [Sample Spans to Counter Metric](#sample-spans-to-counter-metric)
-    + [Sample Spans to Histogram Metric](#sample-spans-to-histogram-metric)
+    + [Maintaining "Probability proportional to size"](#maintaining-probability-proportional-to-size)
+    + [Zero adjusted count](#zero-adjusted-count)
+- [Examples](#examples)
+  * [Span sampling](#span-sampling)
+    + [Sample spans to Counter Metric](#sample-spans-to-counter-metric)
+    + [Sample spans to Histogram Metric](#sample-spans-to-histogram-metric)
+    + [Sample span rate limiting](#sample-span-rate-limiting)
+  * [Metric sampling](#metric-sampling)
     + [Statsd Counter](#statsd-counter)
-    + [Example: Sample span rate limiter](#example-sample-span-rate-limiter)
+    + [Metric exemplars with adjusted counts](#metric-exemplars-with-adjusted-counts)
+    + [Metric cardinality limiter](#metric-cardinality-limiter)
 - [Proposed Tracing specification](#proposed-tracing-specification)
   * [Suggested text](#suggested-text)
 - [Recommended reading](#recommended-reading)
@@ -156,7 +162,7 @@ procedure must be _statistically unbiased_, a term meaning that the
 process is required to give equal consideration to all possible
 outcomes.
 
-#### Introducing variance
+#### Sampling and variance
 
 The use of unbiased sampling outlined above makes it possible to
 estimate the population total for arbitrary subsets of the sample, as

From 0d3b834a96fbbfe9738e8cf17d22301e0d83a00f Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 06:18:02 -0700
Subject: [PATCH 41/68] proposed spec text

---
 text/0148-sampling-probability.md | 87 ++++++++++++++++++++++++-------
 1 file changed, 68 insertions(+), 19 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index ae88f0861..fabb04950 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -600,34 +600,83 @@ After
 [Sampler](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler)
 and before [Builtin
 Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#built-in-samplers),
-a new section named *Probability sampling* will introduce the term
-"adjusted count" and relate it to inclusion probability, citing
-external sources and this OTEP.  For example:
-
-TODO
+a new section will introduce the term "adjusted count" and relate it
+to inclusion probability.  For example:
+
+```md
+### Probability sampling
+
+Probability Samplers are Samplers that output statistically unbiased
+inclusion probability.  Inclusion probability in the context of
+tracing is the *effective* probability of the Sampler returning
+`RECORD_AND_SAMPLE`, which can be decided locally or derived from the
+parent context using the [W3C Trace Context is-sampled
+flag](https://www.w3.org/TR/trace-context/#sampled-flag).
+
+#### Adjusted Count attributes
+
+The recommended way to convey sampling probability *for events* in
+OpenTelemetry is in the form of an **adjusted count**, which is the
+reciprocal (i.e., mathematical inverse function) of inclusion
+probability.
+
+The implied goal of probability sampling is to support estimating the
+count of spans in the population using the spans that were sampled.
+Probability Samplers and probabilistic Span processors SHOULD maintain
+the expected value of the sum of Span adjusted counts, to support this
+goal.
+
+Attributes used to express the adjusted count in an unbiased
+probability sampling scheme SHOULD use a Span attribute name with
+prefix "sampling." and with suffix ".adjusted_count" (e.g.,
+"sampling.sampler_name.adjusted_count").  Adjusted count attributes
+MAY be integer or floating-point values.
+
+#### Inclusion probability tracestate 
+
+The recommended way to convey sampling probability *for contexts* in
+OpenTelemetry is through the W3C Trace Context tracestate using the
+key `head_probability`.
+
+Probability Samplers SHOULD encode the effective sampling inclusion
+probability using tracestate, for the context that was in effect when
+the W3C is-sampled bit was set.  The tracestate field SHOULD be set in
+both sampled and unsampled cases, to convey the inclusion probability
+even for unsampled contexts.
+
+The `head_probability` tracestate key is set to a floating-point
+number greater than or equal to 0 and less than or equal to 1.  The
+floating point precision of the number SHOULD follow implementation
+language standards and SHOULD be high enough to identify when Samplers
+have different inclusion probabilities.
+```
 
 For the `TraceIDRatio` sampler, include the following additional text:
 
-> When returning a `RECORD_AND_SAMPLE` decision, the TraceIDRatio
-> Sampler MUST include the attribute
-> `sampling.traceidratio.adjusted_count=C` for `C` the reciprocal of the
-> configured trace ID ratio.
-> 
-> The returned tracestate used for the child context MUST include an
-> additional key-value carrying the head inclusion probability, equal
-> to the configured trace ID ratio. (TODO: spec the tracestate key for
-> head inclusion probability.)
+```md
+When returning a `RECORD_AND_SAMPLE` decision, the TraceIDRatio
+Sampler MUST include the attribute
+`sampling.traceidratio.adjusted_count=C`, where `C` is the reciprocal of the
+configured trace ID ratio.
+
+The returned tracestate used for the child context SHOULD have the
+tracestate `head_probability` key set to the configured trace
+ID ratio.
+```
 
 For the `Parent` sampler, include the following additional text:
 
-> When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MUST
-> include the attribute `sampling.parent.adjusted_count=C` for `C` the
-> reciprocal of the parent trace context's head inclusion probability,
-> which is passed through W3C tracestate.
+```md
+When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MUST
+include the attribute `sampling.parent.adjusted_count=C`, where `C` is the
+reciprocal of the parent trace context's head inclusion probability,
+which is passed through W3C tracestate using the `head_probability` key.
+```
 
 ## Recommended reading
 
-[Sampling, 3rd Edition, by Steven K. Thompson](https://www.wiley.com/en-us/Sampling%2C+3rd+Edition-p-9780470402313).
+[Sampling, 3rd Edition, by Steven
+K. Thompson](https://www.wiley.com/en-us/Sampling%2C+3rd+Edition-p-9780470402313).
 
 [A Generalization of Sampling Without Replacement From a Finite Universe](https://www.jstor.org/stable/2280784), JSTOR (1952)
 

From 0cbc555c2ee7b7f3710b0c68e193418072ac3410 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 09:12:48 -0700
Subject: [PATCH 42/68] add collapsible details

---
 text/0148-sampling-probability.md | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index fabb04950..4cae4fa38 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -272,6 +272,11 @@ following criteria:
 - Produces complete traces
 - Spans are countable.
 
+<details>
+<summary>
+Detail about Samplers for head trace sampling.
+</summary>
+
 #### `Parent` Sampler
 
 The `Parent` Sampler ensures complete traces, provided all spans are
@@ -360,8 +365,7 @@ P(x)=P(x|y)*P(y)+P(x|not y)*P(not y)
 The variables are:
 
 - **`H`**: The head inclusion probability of the parent context that
-  is in effect, independent of whether the parent context was sampled,
-  the reciprocal of the parent context's effective adjusted count.
+  is in effect, independent of whether the parent context was sampled
 - **`I`**: The inflationary sampling probability for the span being
   started.
 - **`D`**: The decision probability for whether to start a new sub-root.
@@ -400,6 +404,8 @@ trace with adjusted count `1/I`.  This may use a non-descriptive
 Resource or Span attribute named
 `sampling.inflationary.adjusted_count`, for example.
 
+</details>
+
 ### Working with adjusted counts
 
 Head sampling for traces has been discussed, covering strategies to

From 188c9c5508ba7a7c370907203b34d9fc62a90580 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:00:39 -0700
Subject: [PATCH 43/68] rewrite intro

---
 text/0148-sampling-probability.md | 39 ++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 4cae4fa38..0891b92b4 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -42,19 +42,32 @@ Objective: Specify a foundation for sampling techniques in OpenTelemetry.
 
 ## Motivation
 
-In tracing, metrics, and logs, there are widely known techniques for
-sampling a stream of events that, when performed correctly, enable
-collecting a fraction of the complete data while maintaining
-substantial visibility into the whole population of events.
-
-These techniques are all forms of approximate counting.  Estimates
-calculated by the forms of sampling outlined here are considered
-accurate, in the sense that they are random variables with an expected
-value equal to their true value.
-
-While sampling techniques vary, it is possible to specify high-level
-interoperability requirements that producers and consumers of sampled
-data can follow to enable a wide range of sampling designs.
+Probability sampling allows consumers of sampled telemetry data to
+collect a fraction of telemetry events and use them to estimate total
+quantities about the population of events, such as the total rate of
+events with a particular attribute.  Sampling is a general-purpose
+facility for lowering cost at the expense of lower data quality.
+
+These techniques enable reducing the cost of telemetry collection,
+both for producers (i.e., SDKs) and for processors (i.e., Collectors),
+without losing the ability to (at least coarsely) monitor the whole
+system.
+
+Sampling builds on results from probability theory, most significantly
+the concept of expected value.  Estimates drawn from probability
+samples are *random variables* that, when correct procedures are
+followed, accurately reflect their true value, making them unbiased.
+Unbiased samples can be used for after-the-fact analysis.  We can
+answer questions such as "what fraction of events had property X?"
+using the fraction of events in the sample that have property X.
+
+This document outlines how producers and consumers of sample telemetry
+data can convey estimates about the total count of telemetry events,
+without conveying information about how the sample was computed, using
+a quantity known as **adjusted count**.  In common language, a
+"one-in-N" sampling scheme emits events with adjusted count equal to
+N.  Adjusted count is the expected value of the number of events in
+the population represented by an individual sample event.
 
 ## Explanation
 

From f6bf6046cf48c02d0184c86471be8d6ae7fc586c Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:07:15 -0700
Subject: [PATCH 44/68] move examples up

---
 text/0148-sampling-probability.md | 237 +++++++++++++++---------------
 1 file changed, 122 insertions(+), 115 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 0891b92b4..0ac737eba 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -69,6 +69,128 @@ a quantity known as **adjusted count**.  In common language, a
 N.  Adjusted count is the expected value of the number of events in
 the population represented by an individual sample event.
 
+<detail>
+<summary>
+Example applications that apply probability sampling to lower the cost 
+of telemetry collection.
+</summary>
+
+## Examples
+
+In all of these examples, the use of probability sampling leads to an
+attribute like `sampling.sampler_name.adjusted_count`.  Consumers of
+spans, metrics, and logs annotated with adjusted counts are able to
+calculate accurate statistics about the population of events, at a
+basic level, without knowing details about the sampling configuration.
+
+### Span sampling
+
+#### Sample spans to Counter Metric
+
+For every span it receives, the example processor will synthesize
+metric data as though a Counter instrument named `S.count` for span
+named `S` had been incremented once per span at the original `Start()`
+call site.
+
+This processor will add the adjusted count of each span to the
+instrument (e.g., `Add(adjusted_count, labels...)`)  for every span it
+receives, logically effective at the start or end time of the span.
+
+#### Sample spans to Histogram Metric
+
+For every span it receives, the example processor will synthesize
+metric data as though a Histogram instrument named `S.duration` for
+span named `S` had been observed once per span at the original `End()`
+call site.
+
+The OpenTelemetry Metric data model does not support histogram buckets
+with non-integer counts, which forces the use of integer adjusted
+counts here (i.e., 1-in-N sampling rates where N is an integer).
+
+Logically spaking, this processor will observe the span's duration its
+adjusted count number of times for every span it receives, at the end
+time of the span.
+
+#### Sample span rate limiting
+
+A collector processor will introduce a slight delay in order to ensure
+it has received a complete frame of data, during which time it
+maintains a fixed-size buffer of input spans.  If the number of spans
+received exceeds the size of the buffer before the end of the
+interval, begin weighted sampling using the adjusted count of each
+span as input weight.
+
+This processor drops spans when the configured rate threshold is
+exceeeded, otherwise it passes spans through with unmodifed adjusted
+count.
+
+When the interval expires and the sample frame is considered complete,
+the selected sample spans are output with possibly updated adjusted
+counts.
+
+### Metric sampling
+
+#### Statsd Counter
+
+A Statsd counter event appears as a line of text, describing a
+number-valued event with optional attributes and inclusion probability
+("sample rate").
+
+For example, a metric named `name` is incremented by `increment` using
+a counter event (`c`) with the given `sample_rate`.
+
+```
+name:increment|c|@sample_rate
+```
+
+For example, a count of 100 that was selected for a 1-in-10 simple
+random sampling scheme will arrive as:
+
+```
+counter:100|c|@0.1
+```
+
+Events in the example have with 0.1 inclusion probability have
+adjusted count of 10.  Assuming the sample was selected using an
+unbiased algorithm, we can interpret this event as having an expected
+count of `100/0.1 = 1000`.
+
+#### Metric exemplars with adjusted counts
+
+The OTLP protocol for metrics includes a repeated exemplars field in
+every data point.  This is a place where histogram implementations are
+able to provide example context to corrlate metrics with traces.
+
+OTLP exemplars support additional attributes, those that were present
+on the API event and were dropped during aggregation.  Exemplars that
+are selected probabilistically and recorded with their adjusted counts
+make it possible to approximately count events using dimensions that
+were dropped during metric aggregation.  When sampling metric events,
+use probability proportional to size, meaning for Metric Sum data
+points include the absolute point value as a product in the input
+sample weight.
+
+An end-to-end pipeline of sampled metrics events can be constructed
+based on exemplars with adjusted counts, one capable of supporting
+approximate queries over sampled metric events at high cardinality.
+
+#### Metric cardinality limiter
+
+A metrics processor can be configured to limit cardinality for a
+single metric name, allowing no more than K distinct label sets per
+export interval.  The export interval is fixed to a short interval so
+that a complete set of distinct labels can be stored temporarily.
+
+Caveats: as presented, this works for Sum and histograms received in
+Delta temporality and where the Sum is monotonic, as discussed in
+[opentelemetry-proto/issues/303](https://github.com/open-telemetry/opentelemetry-proto/issues/303).
+
+Considering data points received during the interval, when the number
+of points exceeds K, select a probability proportional to size sample
+of points, output every point with an additional (non-descriptive)
+`sampling.cardinality_limit.adjusted_count` attribute.
+</details>
+
 ## Explanation
 
 Consider a hypothetical telemetry signal in which an API event
@@ -486,121 +608,6 @@ of these properties is eligible to be selected for both samples, in
 which case one event could have two non-zero adjusted counts (e.g.,
 `sampling.by_a.adjusted_count` and `sampling.by_b.adjusted_count`).
 
-## Examples
-
-In all of these examples, the use of probability sampling leads to an
-attribute like `sampling.sampler_name.adjusted_count`.  Consumers of
-spans, metrics, and logs annotated with adjusted counts are able to
-calculate accurate statistics about the population of events, at a
-basic level, without knowing details about the sampling configuration.
-
-### Span sampling
-
-#### Sample spans to Counter Metric
-
-For every span it receives, the example processor will synthesize
-metric data as though a Counter instrument named `S.count` for span
-named `S` had been incremented once per span at the original `Start()`
-call site.
-
-This processor will add the adjusted count of each span to the
-instrument (e.g., `Add(adjusted_count, labels...)`)  for every span it
-receives, logically effective at the start or end time of the span.
-
-#### Sample spans to Histogram Metric
-
-For every span it receives, the example processor will synthesize
-metric data as though a Histogram instrument named `S.duration` for
-span named `S` had been observed once per span at the original `End()`
-call site.
-
-The OpenTelemetry Metric data model does not support histogram buckets
-with non-integer counts, which forces the use of integer adjusted
-counts here (i.e., 1-in-N sampling rates where N is an integer).
-
-Logically spaking, this processor will observe the span's duration its
-adjusted count number of times for every span it receives, at the end
-time of the span.
-
-#### Sample span rate limiting
-
-A collector processor will introduce a slight delay in order to ensure
-it has received a complete frame of data, during which time it
-maintains a fixed-size buffer of input spans.  If the number of spans
-received exceeds the size of the buffer before the end of the
-interval, begin weighted sampling using the adjusted count of each
-span as input weight.
-
-This processor drops spans when the configured rate threshold is
-exceeeded, otherwise it passes spans through with unmodifed adjusted
-count.
-
-When the interval expires and the sample frame is considered complete,
-the selected sample spans are output with possibly updated adjusted
-counts.
-
-### Metric sampling
-
-#### Statsd Counter
-
-A Statsd counter event appears as a line of text, describing a
-number-valued event with optional attributes and inclusion probability
-("sample rate").
-
-For example, a metric named `name` is incremented by `increment` using
-a counter event (`c`) with the given `sample_rate`.
-
-```
-name:increment|c|@sample_rate
-```
-
-For example, a count of 100 that was selected for a 1-in-10 simple
-random sampling scheme will arrive as:
-
-```
-counter:100|c|@0.1
-```
-
-Events in the example have with 0.1 inclusion probability have
-adjusted count of 10.  Assuming the sample was selected using an
-unbiased algorithm, we can interpret this event as having an expected
-count of `100/0.1 = 1000`.
-
-#### Metric exemplars with adjusted counts
-
-The OTLP protocol for metrics includes a repeated exemplars field in
-every data point.  This is a place where histogram implementations are
-able to provide example context to corrlate metrics with traces.
-
-OTLP exemplars support additional attributes, those that were present
-on the API event and were dropped during aggregation.  Exemplars that
-are selected probabilistically and recorded with their adjusted counts
-make it possible to approximately count events using dimensions that
-were dropped during metric aggregation.  When sampling metric events,
-use probability proportional to size, meaning for Metric Sum data
-points include the absolute point value as a product in the input
-sample weight.
-
-An end-to-end pipeline of sampled metrics events can be constructed
-based on exemplars with adjusted counts, one capable of supporting
-approximate queries over sampled metric events at high cardinality.
-
-#### Metric cardinality limiter
-
-A metrics processor can be configured to limit cardinality for a
-single metric name, allowing no more than K distinct label sets per
-export interval.  The export interval is fixed to a short interval so
-that a complete set of distinct labels can be stored temporarily.
-
-Caveats: as presented, this works for Sum and histograms received in
-Delta temporality and where the Sum is monotonic, as discussed in
-[opentelemetry-proto/issues/303](https://github.com/open-telemetry/opentelemetry-proto/issues/303).
-
-Considering data points received during the interval, when the number
-of points exceeds K, select a probability proportional to size sample
-of points, output every point with an additional (non-descriptive)
-`sampling.cardinality_limit.adjusted_count` attribute.
-
 ## Proposed Tracing specification
 
 For the standard OpenTelemetry Span Sampler implementations to support

From 547a30a5c7e43e15a347e7b3ebc200de0c64a53d Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:08:02 -0700
Subject: [PATCH 45/68] toc edit

---
 text/0148-sampling-probability.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 0ac737eba..681305e40 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -3,6 +3,15 @@
 <!-- toc -->
 
 - [Motivation](#motivation)
+- [Examples](#examples)
+  * [Span sampling](#span-sampling)
+    + [Sample spans to Counter Metric](#sample-spans-to-counter-metric)
+    + [Sample spans to Histogram Metric](#sample-spans-to-histogram-metric)
+    + [Sample span rate limiting](#sample-span-rate-limiting)
+  * [Metric sampling](#metric-sampling)
+    + [Statsd Counter](#statsd-counter)
+    + [Metric exemplars with adjusted counts](#metric-exemplars-with-adjusted-counts)
+    + [Metric cardinality limiter](#metric-cardinality-limiter)
 - [Explanation](#explanation)
   * [Model and terminology](#model-and-terminology)
     + [Sampling without replacement](#sampling-without-replacement)
@@ -22,15 +31,6 @@
     + [Merging samples](#merging-samples)
     + [Maintaining "Probability proportional to size"](#maintaining-probability-proportional-to-size)
     + [Zero adjusted count](#zero-adjusted-count)
-- [Examples](#examples)
-  * [Span sampling](#span-sampling)
-    + [Sample spans to Counter Metric](#sample-spans-to-counter-metric)
-    + [Sample spans to Histogram Metric](#sample-spans-to-histogram-metric)
-    + [Sample span rate limiting](#sample-span-rate-limiting)
-  * [Metric sampling](#metric-sampling)
-    + [Statsd Counter](#statsd-counter)
-    + [Metric exemplars with adjusted counts](#metric-exemplars-with-adjusted-counts)
-    + [Metric cardinality limiter](#metric-cardinality-limiter)
 - [Proposed Tracing specification](#proposed-tracing-specification)
   * [Suggested text](#suggested-text)
 - [Recommended reading](#recommended-reading)

From 305d3483c064cd7a1cf547d9a7158e3a4811d3e9 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:09:07 -0700
Subject: [PATCH 46/68] detail->details

---
 text/0148-sampling-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 681305e40..6e9ed1f4a 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -69,7 +69,7 @@ a quantity known as **adjusted count**.  In common language, a
 N.  Adjusted count is the expected value of the number of events in
 the population represented by an individual sample event.
 
-<detail>
+<details>
 <summary>
 Example applications that apply probability sampling to lower the cost 
 of telemetry collection.

From b657ae535084171271d0c7af25179d111387ccf4 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:50:26 -0700
Subject: [PATCH 47/68] separate tracing and metrics examples

---
 text/0148-sampling-probability.md | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 6e9ed1f4a..2c450c488 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -69,22 +69,21 @@ a quantity known as **adjusted count**.  In common language, a
 N.  Adjusted count is the expected value of the number of events in
 the population represented by an individual sample event.
 
-<details>
-<summary>
-Example applications that apply probability sampling to lower the cost 
-of telemetry collection.
-</summary>
-
 ## Examples
 
 In all of these examples, the use of probability sampling leads to an
-attribute like `sampling.sampler_name.adjusted_count`.  Consumers of
-spans, metrics, and logs annotated with adjusted counts are able to
-calculate accurate statistics about the population of events, at a
-basic level, without knowing details about the sampling configuration.
+attribute named like `sampling.sampler_name.adjusted_count`.
+Consumers of spans, metrics, and logs annotated with adjusted counts
+are able to calculate accurate statistics about the population of
+events, at a basic level, without knowing details about the sampling
+configuration.
 
 ### Span sampling
 
+<details> 
+<summary> Example use-cases for probability sampling of spans
+generally involve generating metrics from spans.</summary>
+
 #### Sample spans to Counter Metric
 
 For every span it receives, the example processor will synthesize
@@ -96,6 +95,8 @@ This processor will add the adjusted count of each span to the
 instrument (e.g., `Add(adjusted_count, labels...)`)  for every span it
 receives, logically effective at the start or end time of the span.
 
+This is a core use-case for probability sampling.
+
 #### Sample spans to Histogram Metric
 
 For every span it receives, the example processor will synthesize
@@ -111,6 +112,8 @@ Logically spaking, this processor will observe the span's duration its
 adjusted count number of times for every span it receives, at the end
 time of the span.
 
+This is a core use-case for probability sampling.
+
 #### Sample span rate limiting
 
 A collector processor will introduce a slight delay in order to ensure
@@ -127,9 +130,14 @@ count.
 When the interval expires and the sample frame is considered complete,
 the selected sample spans are output with possibly updated adjusted
 counts.
+</details>
 
 ### Metric sampling
 
+<details> 
+<summary> Example use-cases for probability sampling of metrics
+are aimed at lowering cost and addressing high cardinality.</summary>
+
 #### Statsd Counter
 
 A Statsd counter event appears as a line of text, describing a

From 24ab9b5f881d22fb96ee105ccca9bc9d3fb30e20 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:54:31 -0700
Subject: [PATCH 48/68] edit description of head samplers

---
 text/0148-sampling-probability.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 2c450c488..24f25872d 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -415,10 +415,10 @@ following criteria:
 - Produces complete traces
 - Spans are countable.
 
-<details>
-<summary>
-Detail about Samplers for head trace sampling.
-</summary>
+#### Head sampling for traces
+
+<details> <summary> Details about Sampler implementations that meet
+the requirements stated above.  </summary>
 
 #### `Parent` Sampler
 

From d4e3826801924d34b268debe0d486077b5e65715 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 10:55:28 -0700
Subject: [PATCH 49/68] toc edit

---
 text/0148-sampling-probability.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 24f25872d..673ab0fd9 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -24,9 +24,10 @@
     + [Multiply the adjusted count into the data](#multiply-the-adjusted-count-into-the-data)
   * [Trace Sampling](#trace-sampling)
     + [Counting spans and traces](#counting-spans-and-traces)
-    + [`Parent` Sampler](#parent-sampler)
-    + [`TraceIDRatio` Sampler](#traceidratio-sampler)
-    + [Dapper's "Inflationary" Sampler](#dappers-inflationary-sampler)
+    + [Head sampling for traces](#head-sampling-for-traces)
+      - [`Parent` Sampler](#parent-sampler)
+      - [`TraceIDRatio` Sampler](#traceidratio-sampler)
+      - [Dapper's "Inflationary" Sampler](#dappers-inflationary-sampler)
   * [Working with adjusted counts](#working-with-adjusted-counts)
     + [Merging samples](#merging-samples)
     + [Maintaining "Probability proportional to size"](#maintaining-probability-proportional-to-size)
@@ -420,7 +421,7 @@ following criteria:
 <details> <summary> Details about Sampler implementations that meet
 the requirements stated above.  </summary>
 
-#### `Parent` Sampler
+##### `Parent` Sampler
 
 The `Parent` Sampler ensures complete traces, provided all spans are
 successfully recorded.  A downside of `Parent` sampling is that it
@@ -443,7 +444,7 @@ count in the corresponding `SpanData`.  This may use a non-descriptive
 Resource or Span attribute named `sampling.parent.adjusted_count`, for
 example.
 
-#### `TraceIDRatio` Sampler
+##### `TraceIDRatio` Sampler
 
 The OpenTelemetry tracing specification includes a built-in Sampler
 designed for probability sampling using a deterministic sampling
@@ -475,7 +476,7 @@ adjusted count in the corresponding `SpanData`.  This may use a
 non-descriptive Resource or Span attribute named
 `sampling.traceidratio.adjusted_count`, for example.
 
-#### Dapper's "Inflationary" Sampler
+##### Dapper's "Inflationary" Sampler
 
 Google's [Dapper](https://research.google/pubs/pub36356/) tracing
 system describes the use of sampling to control the cost of trace

From f9f2caf72c9c3b79500b8e3656feb4ab4325e7a2 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 11:31:34 -0700
Subject: [PATCH 50/68] edit spec language

---
 text/0148-sampling-probability.md | 42 +++++++++++++++----------------
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 673ab0fd9..b21f0302e 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -94,9 +94,7 @@ call site.
 
 This processor will add the adjusted count of each span to the
 instrument (e.g., `Add(adjusted_count, labels...)`)  for every span it
-receives, logically effective at the start or end time of the span.
-
-This is a core use-case for probability sampling.
+receives, logically taking place at the start or end time of the span.
 
 #### Sample spans to Histogram Metric
 
@@ -113,8 +111,6 @@ Logically spaking, this processor will observe the span's duration its
 adjusted count number of times for every span it receives, at the end
 time of the span.
 
-This is a core use-case for probability sampling.
-
 #### Sample span rate limiting
 
 A collector processor will introduce a slight delay in order to ensure
@@ -174,14 +170,12 @@ OTLP exemplars support additional attributes, those that were present
 on the API event and were dropped during aggregation.  Exemplars that
 are selected probabilistically and recorded with their adjusted counts
 make it possible to approximately count events using dimensions that
-were dropped during metric aggregation.  When sampling metric events,
-use probability proportional to size, meaning for Metric Sum data
-points include the absolute point value as a product in the input
-sample weight.
+were dropped during metric aggregation.
 
 An end-to-end pipeline of sampled metrics events can be constructed
 based on exemplars with adjusted counts, one capable of supporting
-approximate queries over sampled metric events at high cardinality.
+approximate-count queries over sampled metric events at high
+cardinality.
 
 #### Metric cardinality limiter
 
@@ -190,9 +184,10 @@ single metric name, allowing no more than K distinct label sets per
 export interval.  The export interval is fixed to a short interval so
 that a complete set of distinct labels can be stored temporarily.
 
-Caveats: as presented, this works for Sum and histograms received in
-Delta temporality and where the Sum is monotonic, as discussed in
-[opentelemetry-proto/issues/303](https://github.com/open-telemetry/opentelemetry-proto/issues/303).
+Caveats: as presented, this works for Sum and Histogram points
+received with Delta aggregation temporality and where the Sum is
+monotonic (see
+[opentelemetry-proto/issues/303](https://github.com/open-telemetry/opentelemetry-proto/issues/303)).
 
 Considering data points received during the interval, when the number
 of points exceeds K, select a probability proportional to size sample
@@ -645,21 +640,24 @@ Probability Samplers are Samplers that output statistically unbiased
 inclusion probability.  Inclusion probability in the context of
 tracing is the *effective* probability of the Sampler returning
 `RECORD_AND_SAMPLE`, which can be decided locally or derived from the
-parent context using the [W3C Trace Context is-sampled
-flag](https://www.w3.org/TR/trace-context/#sampled-flag).
+context when the [W3C Trace Context is-sampled
+flag](https://www.w3.org/TR/trace-context/#sampled-flag) is in use.
 
 #### Adjusted Count attributes
 
-The recommended way to convey sampling probability *for events* in
-OpenTelemetry is in the form of an **adjusted count**, which is the
-reciprocal (i.e., mathematical inverse function) of inclusion
-probability.
+The recommended way to convey inclusion probability *for events*
+sampled in OpenTelemetry is in the form of the **adjusted count**,
+which is the reciprocal (i.e., mathematical inverse function) of
+inclusion probability.
 
 The implied goal of probability sampling is to support estimating the
 count of spans in the population using the spans that were sampled.
-Probability Samplers and probabilistic Span processors SHOULD maintain
-the expected value of the sum of Span adjusted counts, to support this
-goal.
+The adjusted count associated with a Span is the expected value of the
+number of identical Spans within the population that each Span
+represents.  Probability samplers SHOULD ensure the samples they
+compute are unbiased, which implies that the expected value of the sum
+of adjusted counts in the sample equals the true count of spans in the
+population.
 
 Attributes used to express the adjusted count in an unbiased
 probability sampling scheme SHOULD use a Span attribute name with

From a24a94e3a2584b38f8fb676f8dd33547b2f82311 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 13:43:22 -0700
Subject: [PATCH 51/68] edit spec language(2)

---
 text/0148-sampling-probability.md | 32 ++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index b21f0302e..83a1e0f7e 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -619,8 +619,8 @@ a range of probability sampling schemes, this document recommends the
 use of a Span attribute named `sampling.sampler_name.adjusted_count`
 to encode an unbiased adjusted count computed by a Sampler.
 
-The value any attribute name prefixed with "sampling." and suffixed
-with ".adjusted_count" under this proposal MUST be an unbiased
+The value any attribute name prefixed with `sampling.` and suffixed
+with `.adjusted_count` under this proposal MUST be an unbiased
 estimate of the total population count represented by the individual
 event.
 
@@ -643,7 +643,7 @@ tracing is the *effective* probability of the Sampler returning
 context when the [W3C Trace Context is-sampled
 flag](https://www.w3.org/TR/trace-context/#sampled-flag) is in use.
 
-#### Adjusted Count attributes
+#### Adjusted count span attribute
 
 The recommended way to convey inclusion probability *for events*
 sampled in OpenTelemetry is in the form of the **adjusted count**,
@@ -661,11 +661,11 @@ population.
 
 Attributes used to express the adjusted count in an unbiased
 probability sampling scheme SHOULD use a Span attribute name with
-prefix "sampling." and with suffix ".adjusted_count" (e.g.,
-"sampling.sampler_name.adjusted_count").  Adjusted count attributes
+prefix `sampling.` and with suffix `.adjusted_count` (e.g.,
+`sampling.sampler_name.adjusted_count`).  Adjusted count attributes
 MAY be integer or floating-point values.
 
-#### Inclusion probability tracestate 
+#### Inclusion probability tracestate value
 
 The recommended way to convey sampling probability *for contexts* in
 OpenTelemetry is through the W3C Trace Context tracestate using the
@@ -682,6 +682,24 @@ number greater than or equal to 0 and less than or equal to 1.  The
 floating point precision of the number SHOULD follow implementation
 language standards and SHOULD be high enough to identify when Samplers
 have different inclusion probabilities.
+
+#### Counting spans
+
+Consumers of a stream of span data that may or may not have been
+sampled can follow these steps to count or approximately count the
+total number of spans in the population.
+
+For each span processed, locate all span attributes with prefix
+`sampling.` and suffix `.adjusted_count`.  If there are no adjusted
+count attributes on the span, count a single event.  If there is
+exactly one adjusted count attribute, count that many identical span
+events.
+
+If there are more than one adjusted count attribute on the span, the
+processor SHOULD make a consistent choice for spans of a given
+resource.  By default, the SDK's first preference MUST the builtin
+`TraceIDRatio` sampler, and its second preference MUST be the builtin
+`Parent` sampler.
 ```
 
 For the `TraceIDRatio` sampler, include the following additional text:
@@ -692,7 +710,7 @@ Sampler MUST include the attribute
 `sampling.traceidratio.adjusted_count=C`, where `C` is the reciprocal of the
 configured trace ID ratio.
 
-The returned tracestate used for the child context SHOULD have the
+The returned tracestate used for the child context MUST have the
 tracestate `head_probability` key set to the configured trace
 ID ratio.
 ```

From 8ef61d02cb0c9edf076ec40fa70fe3f355ec19ce Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Wed, 16 Jun 2021 13:59:37 -0700
Subject: [PATCH 52/68] section title edits

---
 text/0148-sampling-probability.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 83a1e0f7e..104dbebf9 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -633,20 +633,20 @@ Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/mai
 a new section will introduce the term "adjusted count" and relate it
 to inclusion probability.  For example:
 
-```md
+```markdown
 ### Probability sampling
 
 Probability Samplers are Samplers that output statistically unbiased
 inclusion probability.  Inclusion probability in the context of
 tracing is the *effective* probability of the Sampler returning
-`RECORD_AND_SAMPLE`, which can be decided locally or derived from the
-context when the [W3C Trace Context is-sampled
+`RECORD_AND_SAMPLE` when invoked, which can be decided locally or 
+derived from the context when the [W3C Trace Context is-sampled
 flag](https://www.w3.org/TR/trace-context/#sampled-flag) is in use.
 
 #### Adjusted count span attribute
 
 The recommended way to convey inclusion probability *for events*
-sampled in OpenTelemetry is in the form of the **adjusted count**,
+sampled in OpenTelemetry is in the form of an **adjusted count**,
 which is the reciprocal (i.e., mathematical inverse function) of
 inclusion probability.
 
@@ -683,7 +683,7 @@ floating point precision of the number SHOULD follow implementation
 language standards and SHOULD be high enough to identify when Samplers
 have different inclusion probabilities.
 
-#### Counting spans
+#### Counting probabilistically sampled spans
 
 Consumers of a stream of span data that may or may not have been
 sampled can follow these steps to count or approximately count the

From 6dc8d3a4c10bbab75d1201cc2b5100fae35ff34a Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Wed, 16 Jun 2021 14:53:52 -0700
Subject: [PATCH 53/68] Apply suggestions from code review

Co-authored-by: Paul Osman <paul@eval.ca>
---
 text/0148-sampling-probability.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 104dbebf9..c6abc6f75 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -313,7 +313,7 @@ Approximate counting comes with variance, a matter of fact which can
 be controlled for by the sample size.  Variance is unavoidable in an
 unbiased sample, but it vanishes when you have enough data.
 
-Although this makes it sounds like small sample sizes are a problem,
+Although this makes it sound like small sample sizes are a problem,
 due to expected high variance, this is just a limitation of the
 technique.  When variance is high, use a larger sample size.
 
@@ -371,7 +371,7 @@ are not integer valued.
 ### Trace Sampling
 
 Sampling techniques are always about lowering the cost of data
-collection and analsyis, but in trace collection and analsysis
+collection and analysis, but in trace collection and analysis
 specifically, approaches can be categorized by whether they reduce
 Tracer overhead.  Tracer overhead is reduced by not recording spans
 for unsampled traces and requires making the sampling decision for a
@@ -482,7 +482,7 @@ reproduced here.
 This kind of Sampler allows non-root spans in a trace to raise the
 probability of tracing, using a conditional probability formula shown
 below.  Traces produced in this way are complete sub-trees, not
-necessarily complete.  This technique is succesful especially in
+necessarily complete.  This technique is successful especially in
 systems where a high-throughput service on occasion calls a
 low-throughput service.  Low-throughput services are meant to inflate
 their sampling probability.

From 5b109135497a25b8950d445a833a083b8d5e0750 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Mon, 28 Jun 2021 12:38:14 -0700
Subject: [PATCH 54/68] Apply suggestions from code review

Co-authored-by: Reiley Yang <reyang@microsoft.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
---
 text/0148-sampling-probability.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index c6abc6f75..2c1f929f2 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -82,7 +82,7 @@ configuration.
 ### Span sampling
 
 <details> 
-<summary> Example use-cases for probability sampling of spans
+<summary>Example use-cases for probability sampling of spans
 generally involve generating metrics from spans.</summary>
 
 #### Sample spans to Counter Metric
@@ -107,7 +107,7 @@ The OpenTelemetry Metric data model does not support histogram buckets
 with non-integer counts, which forces the use of integer adjusted
 counts here (i.e., 1-in-N sampling rates where N is an integer).
 
-Logically spaking, this processor will observe the span's duration its
+Logically speaking, this processor will observe the span's duration its
 adjusted count number of times for every span it receives, at the end
 time of the span.
 
@@ -164,7 +164,7 @@ count of `100/0.1 = 1000`.
 
 The OTLP protocol for metrics includes a repeated exemplars field in
 every data point.  This is a place where histogram implementations are
-able to provide example context to corrlate metrics with traces.
+able to provide example context to correlate metrics with traces.
 
 OTLP exemplars support additional attributes, those that were present
 on the API event and were dropped during aggregation.  Exemplars that
@@ -436,7 +436,7 @@ reason.
 In addition to propagating head inclusion probability, to count
 Parent-sampled spans, each span must directly encode its adjusted
 count in the corresponding `SpanData`.  This may use a non-descriptive
-Resource or Span attribute named `sampling.parent.adjusted_count`, for
+Span attribute named `sampling.parent.adjusted_count`, for
 example.
 
 ##### `TraceIDRatio` Sampler

From 19828433604920888ca9fc5283e5a5dd09ca1af9 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Mon, 28 Jun 2021 21:02:51 -0700
Subject: [PATCH 55/68] minor edits from review feedback

---
 text/0148-sampling-probability.md | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 2c1f929f2..1b09b89f1 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -87,7 +87,7 @@ generally involve generating metrics from spans.</summary>
 
 #### Sample spans to Counter Metric
 
-For every span it receives, the example processor will synthesize
+For every complete span it receives, the example processor will synthesize
 metric data as though a Counter instrument named `S.count` for span
 named `S` had been incremented once per span at the original `Start()`
 call site.
@@ -115,7 +115,7 @@ time of the span.
 
 A collector processor will introduce a slight delay in order to ensure
 it has received a complete frame of data, during which time it
-maintains a fixed-size buffer of input spans.  If the number of spans
+maintains a fixed-size buffer of complete input spans.  If the number of spans
 received exceeds the size of the buffer before the end of the
 interval, begin weighted sampling using the adjusted count of each
 span as input weight.
@@ -238,7 +238,7 @@ that were not selected for the sample have zero inclusion probability.
 Descriptive words that are often used to describe sampling designs:
 
 - *Fixed*: the sampling design is the same from one frame to the next
-- *Adaptive*: the sampling design changes from one frame to the next
+- *Adaptive*: the sampling design changes from one frame to the next based on the observed data
 - *Equal-Probability*: the sampling design uses a single inclusion probability per frame
 - *Unequal-Probability*: the sampling design uses multiple inclusion probabilities per frame
 - *Reservoir*: the sampling design uses fixed space, has fixed-size output.
@@ -327,9 +327,12 @@ lower variance.  It must, because the data remains unbiased.
 
 Some possibilities for encoding the adjusted count or inclusion
 probability are discussed below, depending on the circumstances and
-the protocol.
+the protocol.  Here, the focus is on how to count sampled telemetry
+events in general, not a specific kind of event.  As we shall see in
+the following section, tracing comes with its addional complications.
 
-There are several ways of encoding this information:
+There are several ways of encoding this adjusted count or inclusion
+probability:
 
 - as a dedicated field in an OTLP protobuf message
 - as a non-descriptive Attribute in an OTLP Span, Metric, or Log
@@ -342,6 +345,12 @@ integer number in the range [0, +Inf).  This is a conceptually easy
 way to understand sampling because larger numbers mean greater
 representivity.
 
+Note that it is possible, given this description, to produce adjusted
+counts that are not integers.  Adjusted counts are an approximatation,
+and the expected value of an integer can be a fractional count.
+Floating-point adjusted counts can be avoided with the use of
+integer-reciprocal inclusion probabilities.
+
 #### Encoding inclusion probability
 
 We can encode the inclusion probability directly as a floating point

From 928ea4fb66a2c1a0eb3400fda99ed290a2c42a73 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 1 Jul 2021 00:01:50 -0700
Subject: [PATCH 56/68] expand on the reason to propagate head trace sampling
 probability

---
 text/0148-sampling-probability.md | 98 +++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 23 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 1b09b89f1..7a02715f6 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -394,23 +394,77 @@ produced.  Sampling techniques that lower Tracer overhead and produce
 complete traces are known as _Head trace sampling_ techniques.
 
 The decision to produce and collect a sample trace has to be made when
-the root span starts, to avoid incomplete traces.  Using the sampling
-techniques outlined above, we can approximately count finished spans
-and traces, even without knowing how the head trace sampling decision
-was made.
-
-#### Counting spans and traces
-
-When the [W3C Trace Context is-sampled
-flag](https://www.w3.org/TR/trace-context/#sampled-flag) is used to
-propagate a sampling decision, child spans have the same adjusted
-count as their parent.  This leads to a useful optimization.
-
-It is nice-to-have, though not a requirement, that all spans in a
-trace directly encode their adjusted count.  This enables systems to
-count spans upon arrival, without the work of referring to their
-parent spans.  For example, knowing a span's adjusted count makes it
-possible to immediately produce metric events from span events.
+the root span starts, to avoid incomplete traces.  Then, assuming
+complete traces can be collected, the adjusted count of the root span
+determines an adjusted count for every span in the trace.
+
+#### Counting child spans using root span adjusted counts
+
+The adjusted count of a root span determines the adjusted count of
+each of its children based on the following logic:
+
+- The root span is considered representative of _adjusted count_ many
+  identical root spans, because it was selected using unbiased sampling
+- Context propagation conveys _causation_, the fact the one span produces 
+  another
+- A root span causes each of the child spans in its trace to be produced
+- A sampled root span represents _adjusted count_ many traces, representing
+  the cause of _adjusted count_ many occurances per child span in the 
+  sampled trace.
+
+Using this reasoning, we can define a sample collected from all root
+spans in the system, which allows estimating the count of all spans in
+the population.  Take a simple probability sample of root spans:
+
+1. In the `Sampler` decision for root spans, use the initial span properties 
+   to determine the inclusion probability `P`
+2. Make a pseudo-random selection with probability `P`, if true return
+   `RECORD_AND_SAMPLE` (so that the W3C Trace Context `is-sampled`
+   flag is set in all child contexts)
+3. Encode a span attribute `sampling.root.adjusted_count` equal to `1/P` on the root span
+4. Collect all spans where the W3C Trace Context `is-sampled` flag is set.
+
+After collecting all sampled spans, locate the root span for each.
+Apply the root span's adjusted count to every child in the associated
+trace.  The sum of adjusted counts on all sampled spans is expected to
+equal the population total number spans.
+
+Now, having stored the sample spans with their adjusted counts, and
+assuming the source of randomness is good, we can extrapolate counts
+for the population using arbitrary queries over the sampled spans.
+Sampled spans can be translated into approximate metrics the
+population of spans, after their adjusted counts are known.
+
+The cost of this analysis, using only the root span's adjusted count,
+is that all root spans have to be collected before we can count
+non-root spans.  The cost of indexing and looking up the root span
+adjusted counts makes this analysis relatively expensive to perform in
+real time.
+
+#### Using head trace probability to count all spans
+
+If the W3C `is-sampled` flag will be used to determine whether
+`RECORD_AND_SAMPLE` is returned in a Sampler, then in order to count
+sample spans without first locating the root span requires propagating
+the _head trace sampling probability_ through the context.
+
+Head trace sampling probability may be thought of as the probability
+of causing a child span to be a sampled.  Propagators that maintain
+this variable MUST obey the rules of conditional probability.  In this
+model, the adjusted count of each span depends on the adjusted count
+of its parent, not of the root in a trace.  Still, the sum of adjusted
+counts of all sampled spans is expected to equal the population total
+number of spans.
+
+This applies to other forms of telemetry that happen (i.e., are
+caused) within a context carrying head trace sampling probability.
+For example, we may record log events and metrics exemplars with
+adjusted counts equal to the inverse of the current head trace
+sampling probability when they are produced.
+
+This technique allows translating spans and logs to metrics without
+first locating their root span, a significant performance advantage
+compared with first collecting and indexing root spans.
 
 Several head sampling techniques are discussed in the following
 sections and evaluated in terms of their ability to meet all of the
@@ -435,12 +489,10 @@ effective adjusted count of the context to use when starting child
 spans.
 
 In other head trace sampling schemes, we will see that it is useful to
-propagate inclusion probability even for negative sampling decisions
-(where the adjusted count is zero), therefore we prefer to use the
-inclusion probability and not the adjusted count when propagating the
-sampling rate via trace context.  The inclusion probability of a
-context is referred to as its `head inclusion probability` for this
-reason.
+propagate head trace sampling probability even for negative sampling
+decisions (where the adjusted count is zero), therefore we prefer to
+use the head trace sampling probability (not the inverse, an effective
+adjusted count) when propagating the sampling rate via trace context.
 
 In addition to propagating head inclusion probability, to count
 Parent-sampled spans, each span must directly encode its adjusted

From 76ff9f75ea131b3e2ea88a9aed4312c3432a06b7 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 2 Jul 2021 09:58:09 -0700
Subject: [PATCH 57/68] typesetting adjutsed_count

---
 text/0148-sampling-probability.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 7a02715f6..f35e300fc 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -403,13 +403,13 @@ determines an adjusted count for every span in the trace.
 The adjusted count of a root span determines the adjusted count of
 each of its children based on the following logic:
 
-- The root span is considered representative of _adjusted count_ many
+- The root span is considered representative of `adjusted_count` many
   identical root spans, because it was selected using unbiased sampling
 - Context propagation conveys _causation_, the fact the one span produces 
   another
 - A root span causes each of the child spans in its trace to be produced
-- A sampled root span represents _adjusted count_ many traces, representing
-  the cause of _adjusted count_ many occurances per child span in the 
+- A sampled root span represents `adjusted_count` many traces, representing
+  the cause of `adjusted_count` many occurances per child span in the 
   sampled trace.
 
 Using this reasoning, we can define a sample collected from all root

From 9f13d03307900ecbffcd2de6d5f8b53a15d073ae Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 2 Jul 2021 11:34:52 -0700
Subject: [PATCH 58/68] Remove multiple adjusted counts from the text

---
 text/0148-sampling-probability.md | 105 ++++++++++++++----------------
 1 file changed, 48 insertions(+), 57 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index f35e300fc..4eaff147c 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -1,4 +1,4 @@
-# Probability sampling of telemetry events
+s# Probability sampling of telemetry events
 
 <!-- toc -->
 
@@ -72,12 +72,11 @@ the population represented by an individual sample event.
 
 ## Examples
 
-In all of these examples, the use of probability sampling leads to an
-attribute named like `sampling.sampler_name.adjusted_count`.
-Consumers of spans, metrics, and logs annotated with adjusted counts
-are able to calculate accurate statistics about the population of
-events, at a basic level, without knowing details about the sampling
-configuration.
+These examples use the proposed attribute `sampling.adjusted_count` to
+convey sampling probability.  Consumers of spans, metrics, and logs
+annotated with adjusted counts are able to calculate accurate
+statistics about the whole population of events, at a basic level,
+without knowing details about the sampling configuration.
 
 ### Span sampling
 
@@ -163,8 +162,9 @@ count of `100/0.1 = 1000`.
 #### Metric exemplars with adjusted counts
 
 The OTLP protocol for metrics includes a repeated exemplars field in
-every data point.  This is a place where histogram implementations are
-able to provide example context to correlate metrics with traces.
+every data point.  This is a place where Metric aggregators (e.g., 
+histograms) are able to provide example context to correlate metrics 
+with traces.
 
 OTLP exemplars support additional attributes, those that were present
 on the API event and were dropped during aggregation.  Exemplars that
@@ -191,14 +191,13 @@ monotonic (see
 
 Considering data points received during the interval, when the number
 of points exceeds K, select a probability proportional to size sample
-of points, output every point with an additional (non-descriptive)
-`sampling.cardinality_limit.adjusted_count` attribute.
+of points, output every point with a `sampling.adjusted_count` attribute.
 </details>
 
 ## Explanation
 
-Consider a hypothetical telemetry signal in which an API event
-produces a unit of data that has one or more associated numbers.
+Consider a hypothetical telemetry signal in which a stream of
+data items is produced containing one or more associated numbers.
 Using the OpenTelemetry Metrics data model terminology, we have two
 scenarios in which sampling is common.
 
@@ -225,10 +224,13 @@ will learn to apply these techniques for sampling aggregated data.
 
 In sampling, the term _sampling design_ refers to how sampling
 probability is decided and the term _sample frame_ refers to how
-events are organized into discrete populations.
+events are organized into discrete populations.  The design of a 
+sampling strategy dictates how the population is framed.
 
 For example, a simple design uses uniform probability, and a simple
-framing technique is to collect one sample per distinct span name.
+framing technique is to collect one sample per distinct span name per
+hour.  A different sample framing could collect one sample across all
+span names every 10 minutes.
 
 After executing a sampling design over a frame, each item selected in
 the sample will have known _inclusion probability_, that determines
@@ -311,17 +313,20 @@ count.
 There is a natural relationship between statistical bias and variance.
 Approximate counting comes with variance, a matter of fact which can
 be controlled for by the sample size.  Variance is unavoidable in an
-unbiased sample, but it vanishes when you have enough data.
+unbiased sample, but variance diminishes with increasing sample size.
 
 Although this makes it sound like small sample sizes are a problem,
 due to expected high variance, this is just a limitation of the
 technique.  When variance is high, use a larger sample size.
 
 An easy approach for lowering variance is to aggregate sample frames
-together across time.  For example, although the estimates drawn from
-a one-minute sample may have high variance, combining an hour of
-one-minute sample frames into an aggregate data set is guaranteed to
-lower variance.  It must, because the data remains unbiased.
+together across time, which generally increases the size of the
+subpopulations being counted.  For example, although the estimates for
+the rate of spans by distinct name drawn from a one-minute sample may
+have high variance, combining an hour of one-minute sample frames into
+an aggregate data set is guaranteed to lower variance (assuming the
+numebr of span names stays fixed).  It must, because the data remains
+unbiased, and more data yields lower variance.
 
 ### Conveying the sampling probability
 
@@ -421,7 +426,7 @@ the population.  Take a simple probability sample of root spans:
 2. Make a pseudo-random selection with probability `P`, if true return
    `RECORD_AND_SAMPLE` (so that the W3C Trace Context `is-sampled`
    flag is set in all child contexts)
-3. Encode a span attribute `sampling.root.adjusted_count` equal to `1/P` on the root span
+3. Encode a span attribute `sampling.adjusted_count` equal to `1/P` on the root span
 4. Collect all spans where the W3C Trace Context `is-sampled` flag is set.
 
 After collecting all sampled spans, locate the root span for each.
@@ -497,8 +502,7 @@ adjusted count) when propagating the sampling rate via trace context.
 In addition to propagating head inclusion probability, to count
 Parent-sampled spans, each span must directly encode its adjusted
 count in the corresponding `SpanData`.  This may use a non-descriptive
-Span attribute named `sampling.parent.adjusted_count`, for
-example.
+Span attribute named `sampling.adjusted_count`, for example.
 
 ##### `TraceIDRatio` Sampler
 
@@ -528,9 +532,7 @@ Lacking the number of expected children, we require a way to know the
 minimum Sampler probability across traces to ensure they are complete.
 
 To count TraceIDRatio-sampled spans, each span must encode its
-adjusted count in the corresponding `SpanData`.  This may use a
-non-descriptive Resource or Span attribute named
-`sampling.traceidratio.adjusted_count`, for example.
+adjusted count in the corresponding `SpanData`.
 
 ##### Dapper's "Inflationary" Sampler
 
@@ -600,9 +602,7 @@ D = (I - H) / (1 - H)
 Now the Sampler makes a decision with probability `D`.  Whether the
 decision is true or false, propagate `I` as the new head inclusion
 probability.  If the decision is true, begin recording a sub-rooted
-trace with adjusted count `1/I`.  This may use a non-descriptive
-Resource or Span attribute named
-`sampling.inflationary.adjusted_count`, for example.
+trace with adjusted count `1/I`.
 
 </details>
 
@@ -662,28 +662,22 @@ sampling divided by the (absolute) point value.
 An adjusted count with zero value carries meaningful information,
 specifically that the item participated in a probabilistic sampling
 scheme and was not selected.  A zero value can be be useful to record
-events when they provide useful information despite their effective
-count; we can use this to combine multiple sampling schemes in a
-single stream.
-
-For example, consider collecting two samples from a stream of spans,
-the first sample from those spans with attribute `A=1` and the second
-sample from those spans with attribute `B=2`.  Any span that has both
-of these properties is eligible to be selected for both samples, in
-which case one event could have two non-zero adjusted counts (e.g.,
-`sampling.by_a.adjusted_count` and `sampling.by_b.adjusted_count`).
+events outside of a sample, when they provide useful information
+despite their effective count.  We can use this to record error
+exemplars, for example, even when they are not selected by the
+Sampler.
 
 ## Proposed Tracing specification
 
 For the standard OpenTelemetry Span Sampler implementations to support
 a range of probability sampling schemes, this document recommends the
-use of a Span attribute named `sampling.sampler_name.adjusted_count`
-to encode an unbiased adjusted count computed by a Sampler.
+use of a Span attribute named `sampling.adjusted_count` to encode an
+unbiased adjusted count computed by a Sampler reflecting the whole 
+population of spans.
 
-The value any attribute name prefixed with `sampling.` and suffixed
-with `.adjusted_count` under this proposal MUST be an unbiased
-estimate of the total population count represented by the individual
-event.
+The value the `sampling.adjusted_count` attribuet under this proposal
+MUST be an unbiased estimate of the total population count represented
+by the individual event.
 
 ### Suggested text
 
@@ -720,11 +714,10 @@ compute are unbiased, which implies that the expected value of the sum
 of adjusted counts in the sample equals the true count of spans in the
 population.
 
-Attributes used to express the adjusted count in an unbiased
-probability sampling scheme SHOULD use a Span attribute name with
-prefix `sampling.` and with suffix `.adjusted_count` (e.g.,
-`sampling.sampler_name.adjusted_count`).  Adjusted count attributes
-MAY be integer or floating-point values.
+The adjusted count in an unbiased probability sampling scheme SHOULD
+be expressed using a Span attribute named `sampling.adjusted_count`
+when it represents the whole population of events.  Adjusted count 
+attributes MAY be integer or floating-point values.
 
 #### Inclusion probability tracestate value
 
@@ -750,11 +743,9 @@ Consumers of a stream of span data that may or may not have been
 sampled can follow these steps to count or approximately count the
 total number of spans in the population.
 
-For each span processed, locate all span attributes with prefix
-`sampling.` and suffix `.adjusted_count`.  If there are no adjusted
-count attributes on the span, count a single event.  If there is
-exactly one adjusted count attribute, count that many identical span
-events.
+For each span processed, locate the `sampling.adjusted_count` attribute. 
+If there is none, count a single span event.  If the attribute is set, 
+count that many identical span events.
 
 If there are more than one adjusted count attribute on the span, the
 processor SHOULD make a consistent choice for spans of a given
@@ -768,7 +759,7 @@ For the `TraceIDRatio` sampler, include the following additional text:
 ```md
 When returning a `RECORD_AND_SAMPLE` decision, the TraceIDRatio
 Sampler MUST include the attribute
-`sampling.traceidratio.adjusted_count=C`, where `C` is the reciprocal of the
+`sampling.adjusted_count=C`, where `C` is the reciprocal of the
 configured trace ID ratio.
 
 The returned tracestate used for the child context MUST have the
@@ -780,7 +771,7 @@ For the `Parent` sampler, include the following additional text:
 
 ```md
 When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MUST
-include the attribute `sampling.parent.adjusted_count=C`, where `C` is the
+include the attribute `sampling.adjusted_count=C`, where `C` is the
 reciprocal of the parent trace context's head inclusion probability,
 which is passed through W3C tracestate using the `head_probability` key.
 ```

From 81277fe4f22ba39f9262473c8e6c84e6e57cce13 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Fri, 2 Jul 2021 11:36:08 -0700
Subject: [PATCH 59/68] Update text/0148-sampling-probability.md

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
---
 text/0148-sampling-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 4eaff147c..652a1da50 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -396,7 +396,7 @@ spans branching from a certain root are expected to be fully
 collected.  When sampling is applied to reduce Tracer overhead, there
 is generally an expectation that complete traces will still be
 produced.  Sampling techniques that lower Tracer overhead and produce
-complete traces are known as _Head trace sampling_ techniques.
+complete traces are known as _Head-based trace sampling_ techniques.
 
 The decision to produce and collect a sample trace has to be made when
 the root span starts, to avoid incomplete traces.  Then, assuming

From 1e9ce38bef7e3615324692771dd456d2bc0b1761 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@users.noreply.github.com>
Date: Thu, 8 Jul 2021 00:09:05 -0700
Subject: [PATCH 60/68] Apply suggestions from code review

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
---
 text/0148-sampling-probability.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 652a1da50..040bef04a 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -710,8 +710,8 @@ count of spans in the population using the spans that were sampled.
 The adjusted count associated with a Span is the expected value of the
 number of identical Spans within the population that each Span
 represents.  Probability samplers SHOULD ensure the samples they
-compute are unbiased, which implies that the expected value of the sum
-of adjusted counts in the sample equals the true count of spans in the
+compute are unbiased, which implies that the sum
+of adjusted counts in the sample approximates the true count of spans in the
 population.
 
 The adjusted count in an unbiased probability sampling scheme SHOULD
@@ -725,7 +725,7 @@ The recommended way to convey sampling probability *for contexts* in
 OpenTelemetry is through the W3C Trace Context tracestate using the
 key `head_probability`.
 
-Probability Samplers SHOULD encode the effective sampling inclusion
+Probability Samplers MAY encode the effective sampling inclusion
 probability using tracestate, for the context that was in effect when
 the W3C is-sampled bit was set.  The tracestate field SHOULD be set in
 both sampled and unsampled cases, to convey the inclusion probability

From 69be9b990550aef8dd7bfe81bf95374c9089ad6e Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 8 Jul 2021 00:29:05 -0700
Subject: [PATCH 61/68] more MAY

---
 text/0148-sampling-probability.md | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index 4eaff147c..8d5ae7891 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -675,7 +675,7 @@ use of a Span attribute named `sampling.adjusted_count` to encode an
 unbiased adjusted count computed by a Sampler reflecting the whole 
 population of spans.
 
-The value the `sampling.adjusted_count` attribuet under this proposal
+The value the `sampling.adjusted_count` attribute under this proposal
 MUST be an unbiased estimate of the total population count represented
 by the individual event.
 
@@ -746,34 +746,32 @@ total number of spans in the population.
 For each span processed, locate the `sampling.adjusted_count` attribute. 
 If there is none, count a single span event.  If the attribute is set, 
 count that many identical span events.
-
-If there are more than one adjusted count attribute on the span, the
-processor SHOULD make a consistent choice for spans of a given
-resource.  By default, the SDK's first preference MUST the builtin
-`TraceIDRatio` sampler, and its second preference MUST be the builtin
-`Parent` sampler.
 ```
 
 For the `TraceIDRatio` sampler, include the following additional text:
 
 ```md
 When returning a `RECORD_AND_SAMPLE` decision, the TraceIDRatio
-Sampler MUST include the attribute
-`sampling.adjusted_count=C`, where `C` is the reciprocal of the
-configured trace ID ratio.
+Sampler MUST include the attribute `sampling.adjusted_count=C`, where
+`C` is the reciprocal of the configured trace ID ratio.
 
-The returned tracestate used for the child context MUST have the
-tracestate `head_probability` key set to the configured trace
-ID ratio.
+The returned tracestate used for the child context MAY have the
+tracestate `otel` key with the sub-key `headprob` set to the configured 
+trace ID ratio.
 ```
 
 For the `Parent` sampler, include the following additional text:
 
 ```md
-When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MUST
+When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MAY
 include the attribute `sampling.adjusted_count=C`, where `C` is the
-reciprocal of the parent trace context's head inclusion probability,
-which is passed through W3C tracestate using the `head_probability` key.
+reciprocal of the parent trace context's head inclusion probability.
+
+The tracestate `otel` key with the sub-key `headprob` is used to lookup
+and propagate the configured  trace ID ratio.  When the `otel` key with 
+sub-key `headprob` is not located and `is-sampled` is set, the Sampler 
+MUST set `sampling.adjusted_count=0` to signal that spans cannot be 
+reliably counted.
 ```
 
 ## Recommended reading

From c4c06cd58ceae3f01b4f46d9edadf099cd5ed395 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Thu, 22 Jul 2021 17:19:10 -0700
Subject: [PATCH 62/68] propose less specification text: only two attributes

---
 text/0148-sampling-probability.md | 138 +++++++++---------------------
 1 file changed, 40 insertions(+), 98 deletions(-)

diff --git a/text/0148-sampling-probability.md b/text/0148-sampling-probability.md
index fcd2acb35..96fe69386 100644
--- a/text/0148-sampling-probability.md
+++ b/text/0148-sampling-probability.md
@@ -667,111 +667,53 @@ despite their effective count.  We can use this to record error
 exemplars, for example, even when they are not selected by the
 Sampler.
 
-## Proposed Tracing specification
-
-For the standard OpenTelemetry Span Sampler implementations to support
-a range of probability sampling schemes, this document recommends the
-use of a Span attribute named `sampling.adjusted_count` to encode an
-unbiased adjusted count computed by a Sampler reflecting the whole 
-population of spans.
-
-The value the `sampling.adjusted_count` attribute under this proposal
-MUST be an unbiased estimate of the total population count represented
-by the individual event.
-
-### Suggested text
-
-After
-[Sampler](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler)
-and before [Builtin
-Samplers](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#built-in-samplers),
-a new section will introduce the term "adjusted count" and relate it
-to inclusion probability.  For example:
-
-```markdown
-### Probability sampling
-
-Probability Samplers are Samplers that output statistically unbiased
-inclusion probability.  Inclusion probability in the context of
-tracing is the *effective* probability of the Sampler returning
-`RECORD_AND_SAMPLE` when invoked, which can be decided locally or 
-derived from the context when the [W3C Trace Context is-sampled
-flag](https://www.w3.org/TR/trace-context/#sampled-flag) is in use.
-
-#### Adjusted count span attribute
-
-The recommended way to convey inclusion probability *for events*
-sampled in OpenTelemetry is in the form of an **adjusted count**,
-which is the reciprocal (i.e., mathematical inverse function) of
-inclusion probability.
-
-The implied goal of probability sampling is to support estimating the
-count of spans in the population using the spans that were sampled.
-The adjusted count associated with a Span is the expected value of the
-number of identical Spans within the population that each Span
-represents.  Probability samplers SHOULD ensure the samples they
-compute are unbiased, which implies that the sum
-of adjusted counts in the sample approximates the true count of spans in the
-population.
-
-The adjusted count in an unbiased probability sampling scheme SHOULD
-be expressed using a Span attribute named `sampling.adjusted_count`
-when it represents the whole population of events.  Adjusted count 
-attributes MAY be integer or floating-point values.
-
-#### Inclusion probability tracestate value
-
-The recommended way to convey sampling probability *for contexts* in
-OpenTelemetry is through the W3C Trace Context tracestate using the
-key `head_probability`.
-
-Probability Samplers MAY encode the effective sampling inclusion
-probability using tracestate, for the context that was in effect when
-the W3C is-sampled bit was set.  The tracestate field SHOULD be set in
-both sampled and unsampled cases, to convey the inclusion probability
-even for unsampled contexts.
-
-The `head_probability` tracestate key is set to a floating-point
-number greater than or equal to 0 and less than or equal to 1.  The
-floating point precision of the number SHOULD follow implementation
-language standards and SHOULD be high enough to identify when Samplers
-have different inclusion probabilities.
-
-#### Counting probabilistically sampled spans
-
-Consumers of a stream of span data that may or may not have been
-sampled can follow these steps to count or approximately count the
-total number of spans in the population.
-
-For each span processed, locate the `sampling.adjusted_count` attribute. 
-If there is none, count a single span event.  If the attribute is set, 
-count that many identical span events.
+## Proposed specification text
+
+The following text will be added to the semantic conventions for
+tracing.
+
 ```
+# Semantic conventions for Sampled spans
 
-For the `TraceIDRatio` sampler, include the following additional text:
+This document defines how to describe an what sampling was performed
+when recording a span that has had sampling logic applied.
 
-```md
-When returning a `RECORD_AND_SAMPLE` decision, the TraceIDRatio
-Sampler MUST include the attribute `sampling.adjusted_count=C`, where
-`C` is the reciprocal of the configured trace ID ratio.
+Span sampling attributes support computing metrics about spans that
+are part of a sampled trace from knowing their sampling inclusion
+probability.
 
-The returned tracestate used for the child context MAY have the
-tracestate `otel` key with the sub-key `headprob` set to the configured 
-trace ID ratio.
-```
+The _adjusted count_ of a span is defined as follows:
+
+- Adjusted count equals zero when inclusion probability equals zero
+- Adjusted count equals the mathematical inverse (i.e., reciprocal) of sampling inclusion probability when inclusion probability is non-zero.
+
+Consumers of spans carrying an adjusted count attribute are able to
+use the adjusted count of the span to increment a counter of matching
+spans.
+
+## Probability Sampling Attributes
+
+The `sampler.adjusted_count` attribute MUST reflect an unbiased
+estimate of the number of representative spans in the population of
+spans being produced.
+
+When built-in Samplers are used, the name of the effective Sampler
+that computed the adjusted count is included to indicate how the sample
+was computed, which may give additional information.
 
-For the `Parent` sampler, include the following additional text:
+| Attribute | Type | Description | Examples | Required |
+|---|---|---|---|---|
+| `sampler.adjusted_count` | number | Effective count of the associated span. | 10 | No |
+| `sampler.name` | string | The name of the Sampler that determined the adjusted count. | `Parent` | Yes |
 
-```md
-When returning a `RECORD_AND_SAMPLE` decision, the Parent Sampler MAY
-include the attribute `sampling.adjusted_count=C`, where `C` is the
-reciprocal of the parent trace context's head inclusion probability.
+For the built-in samplers, the following names are specified:
 
-The tracestate `otel` key with the sub-key `headprob` is used to lookup
-and propagate the configured  trace ID ratio.  When the `otel` key with 
-sub-key `headprob` is not located and `is-sampled` is set, the Sampler 
-MUST set `sampling.adjusted_count=0` to signal that spans cannot be 
-reliably counted.
+| Built-in Sampler | Sets `sampler.adjusted_count`? | `sampler.name` | Notes |
+| -- | -- | -- |
+| AlwaysOn | No | Not applicable | Sampling attributes are not used | 
+| AlwaysOff | No | Not applicable | Spans are not recorded |
+| ParentBased | Maybe | `Parent` | Adjusted count requires propagation |
+| TraceIDRatio | Yes | `TraceIDRatio` | |
 ```
 
 ## Recommended reading

From 88d38b5e82e54317c692a62e730e7aef030dfac2 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Fri, 23 Jul 2021 12:36:33 -0700
Subject: [PATCH 63/68] Rename to ./trace

---
 text/{ => trace}/0148-sampling-probability.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename text/{ => trace}/0148-sampling-probability.md (100%)

diff --git a/text/0148-sampling-probability.md b/text/trace/0148-sampling-probability.md
similarity index 100%
rename from text/0148-sampling-probability.md
rename to text/trace/0148-sampling-probability.md

From 342105867f472cf3d1886d6bde096d0c2d3207ed Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 00:26:56 -0700
Subject: [PATCH 64/68] remove <details>

---
 text/trace/0148-sampling-probability.md | 55 +++++++++++--------------
 1 file changed, 25 insertions(+), 30 deletions(-)

diff --git a/text/trace/0148-sampling-probability.md b/text/trace/0148-sampling-probability.md
index 96fe69386..a462836d6 100644
--- a/text/trace/0148-sampling-probability.md
+++ b/text/trace/0148-sampling-probability.md
@@ -32,8 +32,7 @@ s# Probability sampling of telemetry events
     + [Merging samples](#merging-samples)
     + [Maintaining "Probability proportional to size"](#maintaining-probability-proportional-to-size)
     + [Zero adjusted count](#zero-adjusted-count)
-- [Proposed Tracing specification](#proposed-tracing-specification)
-  * [Suggested text](#suggested-text)
+- [Proposed specification text](#proposed-specification-text)
 - [Recommended reading](#recommended-reading)
 - [Acknowledgements](#acknowledgements)
 
@@ -72,7 +71,7 @@ the population represented by an individual sample event.
 
 ## Examples
 
-These examples use the proposed attribute `sampling.adjusted_count` to
+These examples use the proposed attribute `sampler.adjusted_count` to
 convey sampling probability.  Consumers of spans, metrics, and logs
 annotated with adjusted counts are able to calculate accurate
 statistics about the whole population of events, at a basic level,
@@ -80,9 +79,8 @@ without knowing details about the sampling configuration.
 
 ### Span sampling
 
-<details> 
-<summary>Example use-cases for probability sampling of spans
-generally involve generating metrics from spans.</summary>
+Example use-cases for probability sampling of spans
+generally involve generating metrics from spans.
 
 #### Sample spans to Counter Metric
 
@@ -121,18 +119,16 @@ span as input weight.
 
 This processor drops spans when the configured rate threshold is
 exceeeded, otherwise it passes spans through with unmodifed adjusted
-count.
+counts.
 
 When the interval expires and the sample frame is considered complete,
 the selected sample spans are output with possibly updated adjusted
 counts.
-</details>
 
 ### Metric sampling
 
-<details> 
-<summary> Example use-cases for probability sampling of metrics
-are aimed at lowering cost and addressing high cardinality.</summary>
+Example use-cases for probability sampling of metrics
+are aimed at lowering cost and addressing high cardinality.
 
 #### Statsd Counter
 
@@ -191,8 +187,7 @@ monotonic (see
 
 Considering data points received during the interval, when the number
 of points exceeds K, select a probability proportional to size sample
-of points, output every point with a `sampling.adjusted_count` attribute.
-</details>
+of points, output every point with a `sampler.adjusted_count` attribute.
 
 ## Explanation
 
@@ -326,7 +321,7 @@ the rate of spans by distinct name drawn from a one-minute sample may
 have high variance, combining an hour of one-minute sample frames into
 an aggregate data set is guaranteed to lower variance (assuming the
 numebr of span names stays fixed).  It must, because the data remains
-unbiased, and more data yields lower variance.
+unbiased, so more data results in lower variance.
 
 ### Conveying the sampling probability
 
@@ -334,7 +329,7 @@ Some possibilities for encoding the adjusted count or inclusion
 probability are discussed below, depending on the circumstances and
 the protocol.  Here, the focus is on how to count sampled telemetry
 events in general, not a specific kind of event.  As we shall see in
-the following section, tracing comes with its addional complications.
+the following section, tracing comes with addional complications.
 
 There are several ways of encoding this adjusted count or inclusion
 probability:
@@ -364,20 +359,21 @@ where each line includes an optional probability.  In this context,
 the probability is also commonly referred to as a "sampling rate".  In
 this case, smaller numbers mean greater representivity.
 
-#### Encoding negative base-2 logarithm of inclusion probability
+#### Encoding base-2 logarithm of adjusted count
 
-We can encode the negative base-2 logarithm of inclusion probability.
-This restricts inclusion probabilities to powers of two and allows the
-use of small non-negative integers to encode power-of-two adjusted
-counts.  In this case, larger numbers mean exponentially greater
-representivity.
+We can encode the base-2 logarithm of adjusted count (i.e., negative
+base-2 logarithm of inclusion probability).  By using an integer
+field, restricting adjusted counts and inclusion probabilities to
+powers of two, this allows the use of small non-negative integers to
+encode the adjusted count.  In this case, larger numbers mean
+exponentially greater representivity.
 
 #### Multiply the adjusted count into the data
 
 When the data itself carries counts, such as for the Metrics Sum and
-Histogram points.
+Histogram points, the adjusted count can be multipled into the data.
 
-This technique is less desirable because while it preserves the
+This technique is less desirable because, while it preserves the
 expected value of the count or sum, the data loses information about
 variance.  This may also lead to rounding errors, when adjusted counts
 are not integer valued.
@@ -426,7 +422,7 @@ the population.  Take a simple probability sample of root spans:
 2. Make a pseudo-random selection with probability `P`, if true return
    `RECORD_AND_SAMPLE` (so that the W3C Trace Context `is-sampled`
    flag is set in all child contexts)
-3. Encode a span attribute `sampling.adjusted_count` equal to `1/P` on the root span
+3. Encode a span attribute `sampler.adjusted_count` equal to `1/P` on the root span
 4. Collect all spans where the W3C Trace Context `is-sampled` flag is set.
 
 After collecting all sampled spans, locate the root span for each.
@@ -481,8 +477,8 @@ following criteria:
 
 #### Head sampling for traces
 
-<details> <summary> Details about Sampler implementations that meet
-the requirements stated above.  </summary>
+Details about Sampler implementations that meet
+the requirements stated above.
 
 ##### `Parent` Sampler
 
@@ -502,7 +498,7 @@ adjusted count) when propagating the sampling rate via trace context.
 In addition to propagating head inclusion probability, to count
 Parent-sampled spans, each span must directly encode its adjusted
 count in the corresponding `SpanData`.  This may use a non-descriptive
-Span attribute named `sampling.adjusted_count`, for example.
+Span attribute named `sampler.adjusted_count`, for example.
 
 ##### `TraceIDRatio` Sampler
 
@@ -604,8 +600,6 @@ decision is true or false, propagate `I` as the new head inclusion
 probability.  If the decision is true, begin recording a sub-rooted
 trace with adjusted count `1/I`.
 
-</details>
-
 ### Working with adjusted counts
 
 Head sampling for traces has been discussed, covering strategies to
@@ -670,7 +664,8 @@ Sampler.
 ## Proposed specification text
 
 The following text will be added to the semantic conventions for
-tracing.
+recording the Sampler name and adjusted count (if known) as
+OpenTelemetry Span attributes.
 
 ```
 # Semantic conventions for Sampled spans

From 3587b11f8eda1c9b75741eaa9dff8cbdd88e5eaf Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 12:00:05 -0700
Subject: [PATCH 65/68] typos

---
 text/trace/0148-sampling-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/trace/0148-sampling-probability.md b/text/trace/0148-sampling-probability.md
index a462836d6..831231207 100644
--- a/text/trace/0148-sampling-probability.md
+++ b/text/trace/0148-sampling-probability.md
@@ -428,12 +428,12 @@ the population.  Take a simple probability sample of root spans:
 After collecting all sampled spans, locate the root span for each.
 Apply the root span's adjusted count to every child in the associated
 trace.  The sum of adjusted counts on all sampled spans is expected to
-equal the population total number spans.
+equal the population total number of spans.
 
 Now, having stored the sample spans with their adjusted counts, and
 assuming the source of randomness is good, we can extrapolate counts
 for the population using arbitrary queries over the sampled spans.
-Sampled spans can be translated into approximate metrics the
+Sampled spans can be translated into approximate metrics over the
 population of spans, after their adjusted counts are known.
 
 The cost of this analysis, using only the root span's adjusted count,

From a0012daaf39b36725da5e85f01f41e562d33ffdd Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 12:11:58 -0700
Subject: [PATCH 66/68] refer to 168, shorten text on Parent sampler

---
 text/trace/0148-sampling-probability.md | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/text/trace/0148-sampling-probability.md b/text/trace/0148-sampling-probability.md
index 831231207..95e3e9304 100644
--- a/text/trace/0148-sampling-probability.md
+++ b/text/trace/0148-sampling-probability.md
@@ -485,20 +485,15 @@ the requirements stated above.
 The `Parent` Sampler ensures complete traces, provided all spans are
 successfully recorded.  A downside of `Parent` sampling is that it
 takes away control over Tracer overhead from non-roots in the trace.
-To support counting spans, this Sampler requires propagating the
-effective adjusted count of the context to use when starting child
-spans.
-
-In other head trace sampling schemes, we will see that it is useful to
-propagate head trace sampling probability even for negative sampling
-decisions (where the adjusted count is zero), therefore we prefer to
-use the head trace sampling probability (not the inverse, an effective
-adjusted count) when propagating the sampling rate via trace context.
-
-In addition to propagating head inclusion probability, to count
-Parent-sampled spans, each span must directly encode its adjusted
-count in the corresponding `SpanData`.  This may use a non-descriptive
-Span attribute named `sampler.adjusted_count`, for example.
+To support real-time span-to-metrics applications, this Sampler
+requires propagating the sampling probability or adjusted count of
+the context in effect when starting child spans.  This is expanded
+upon in [OTEP 168](https://github.com/open-telemetry/oteps/pull/168).
+
+When propagating head sampling probability, spans recorded by the
+`Parent` sampler MAY encode the adjusted count in the corresponding
+`SpanData` using a non-descriptive Span attribute named
+`sampler.adjusted_count`.
 
 ##### `TraceIDRatio` Sampler
 

From 0bfa686d0e91c566125f3b3a9532e7190f435004 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 12:28:42 -0700
Subject: [PATCH 67/68] Remove final three sections, not necessary

---
 text/trace/0148-sampling-probability.md | 71 ++++++-------------------
 1 file changed, 15 insertions(+), 56 deletions(-)

diff --git a/text/trace/0148-sampling-probability.md b/text/trace/0148-sampling-probability.md
index 95e3e9304..ce7e7707b 100644
--- a/text/trace/0148-sampling-probability.md
+++ b/text/trace/0148-sampling-probability.md
@@ -510,7 +510,9 @@ decision, but give each service control over Tracer overhead.  Each
 service sets its sampling probability independently, and the
 coordinated decision ensures that some traces will be complete.
 Traces are complete when the TraceID ratio falls below the minimum
-Sampler probability across the whole trace.
+Sampler probability across the whole trace.  Techniques have been
+developed for [analysis of partial traces that are compatible with
+TraceID ratio sampling](https://arxiv.org/pdf/2107.07703.pdf).
 
 The `TraceIDRatio` Sampler has another difficulty with testing for
 completeness.  It is impossible to know whether there are missing leaf
@@ -522,8 +524,9 @@ span](https://github.com/open-telemetry/opentelemetry-specification/issues/355).
 Lacking the number of expected children, we require a way to know the
 minimum Sampler probability across traces to ensure they are complete.
 
-To count TraceIDRatio-sampled spans, each span must encode its
-adjusted count in the corresponding `SpanData`.
+To count TraceIDRatio-sampled spans, each span MAY encode its adjusted
+count in the corresponding `SpanData` using a non-descriptive Span
+attribute named `sampler.adjusted_count`.
 
 ##### Dapper's "Inflationary" Sampler
 
@@ -595,57 +598,6 @@ decision is true or false, propagate `I` as the new head inclusion
 probability.  If the decision is true, begin recording a sub-rooted
 trace with adjusted count `1/I`.
 
-### Working with adjusted counts
-
-Head sampling for traces has been discussed, covering strategies to
-lower Tracer overhead, ensure trace completeness, and count spans on
-arrival.  Sampled spans have an additional attribute to directly
-encode the adjusted count, and the sum of adjusted counts for a set of
-spans accurately reflects the total population count.
-
-In systems based on collecting sample data, it is often useful to
-merge samples, in order to maintain a small data set.  For example,
-given 24 one-hour samples of 1000 spans each, can we combine the data
-into a one-day sample of 1000 spans?  To do this without introducing
-bias, we must take the adjusted count of each span into account.
-Sampling algorithms that do this are known as _weighted sampling
-algorithms_.
-
-#### Merging samples
-
-To merge samples means to combine two or more frames of sample data
-into a single frame of sample data that reflects the combined
-populations of data in an unbiased way.  Two weighted sampling
-algorithms are listed below in [Recommended Reading](#recommended-reading).
-
-In a broader context, weighted sampling algorithms support estimating
-population weights from a sample of unequal-weight items.  In a
-telemetry context, item weights are generally event counts, and
-weighted sampling algorithms support estimating total population
-counts from a sample with unequal-count items.
-
-The output of weighted sampling, in a telemetry context, are samples
-containing events with new, unequal adjusted counts that maintain
-their power to estimate counts in the combined population.
-
-#### Maintaining "Probability proportional to size"
-
-The statistical property being maintained in the definition for
-weighted sampling used above is known as _probability propertional to
-size_.  The "size" of an item, in this context, refers to the
-magnitude of each item's contribution to the total that is being
-estimated. To avoid bias, larger-magnitude items should have a
-proportionally greater probability of being selected in a sample,
-compared with items of smaller magnitide.
-
-The interpretation of "size", therefore, depends on what is being
-estimated.  When sampling events with a natural size, such as for
-Metric Sum data points, the absolute value of the point should be
-multiplied with the adjusted count to form an effective input weight
-(i.e., its "size" or contribution to the population total).  The
-output adjusted count in this case is the output from weighted
-sampling divided by the (absolute) point value.
-
 #### Zero adjusted count
 
 An adjusted count with zero value carries meaningful information,
@@ -706,6 +658,11 @@ For the built-in samplers, the following names are specified:
 | TraceIDRatio | Yes | `TraceIDRatio` | |
 ```
 
+Note that the `AlwaysOn` and `AlwaysOff` Samplers do not need to
+recorder their names, since they are indistinguishable from not having
+a stampler configured.  When there is no `sampler.name` attribute
+present
+
 ## Recommended reading
 
 [Sampling, 3rd Edition, by Steven
@@ -719,8 +676,10 @@ K. Thompson](https://www.wiley.com/en-us/Sampling%2C+3rd+Edition-p-9780470402313
 
 [Stream sampling for variance-optimal estimation of subset sums](https://arxiv.org/abs/0803.0473).
 
+[Estimation from Partially Sampled Distributed Traces](https://arxiv.org/pdf/2107.07703.pdf), 2021 Dynatrace Research report, Otmar Ertl
+
 ## Acknowledgements
 
 Thanks to [Neena Dugar](https://github.com/neena) and [Alex
-Kehlenbeck](https://github.com/akehlenbeck) for reconstructing the
-Dapper Sampler algorithm.
+Kehlenbeck](https://github.com/akehlenbeck) for their help
+reconstructing the Dapper Sampler algorithm.

From 0795e72d3a646b43aee5843462052964198cee49 Mon Sep 17 00:00:00 2001
From: Joshua MacDonald <jmacd@lightstep.com>
Date: Tue, 27 Jul 2021 12:31:58 -0700
Subject: [PATCH 68/68] more note

---
 text/trace/0148-sampling-probability.md | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/text/trace/0148-sampling-probability.md b/text/trace/0148-sampling-probability.md
index ce7e7707b..6ce72dae3 100644
--- a/text/trace/0148-sampling-probability.md
+++ b/text/trace/0148-sampling-probability.md
@@ -488,7 +488,7 @@ takes away control over Tracer overhead from non-roots in the trace.
 To support real-time span-to-metrics applications, this Sampler
 requires propagating the sampling probability or adjusted count of
 the context in effect when starting child spans.  This is expanded
-upon in [OTEP 168](https://github.com/open-telemetry/oteps/pull/168).
+upon in [OTEP 168 (WIP)](https://github.com/open-telemetry/oteps/pull/168).
 
 When propagating head sampling probability, spans recorded by the
 `Parent` sampler MAY encode the adjusted count in the corresponding
@@ -661,7 +661,12 @@ For the built-in samplers, the following names are specified:
 Note that the `AlwaysOn` and `AlwaysOff` Samplers do not need to
 recorder their names, since they are indistinguishable from not having
 a stampler configured.  When there is no `sampler.name` attribute
-present
+present and a Span is recorded, it should be counted as one span
+(i.e., count == adjusted_count).
+
+See [OTEP 168 (WIP)](https://github.com/open-telemetry/oteps/pull/168)
+for details about how to report sampling probability when using the
+`Parent` Sampler.
 
 ## Recommended reading