open-telemetry · joaopgrassi · Jun 29, 2022 · Jun 29, 2022 · Jul 11, 2022 · Jul 11, 2022
@@ -54,6 +54,9 @@ release.
 
 ### OpenTelemetry Protocol
 
+- Add support for partial success in an OTLP export response
+  ([#2636](https://github.com/open-telemetry/opentelemetry-specification/pull/2636)).
+
 ### SDK Configuration
 
 ### Telemetry Schemas

@@ -16,14 +16,18 @@ nodes such as collectors and telemetry backends.
   * [OTLP/gRPC](#otlpgrpc)
     + [OTLP/gRPC Concurrent Requests](#otlpgrpc-concurrent-requests)
     + [OTLP/gRPC Response](#otlpgrpc-response)
+      - [Success](#success)
+      - [Partial Success](#partial-success)
+      - [Failures](#failures)
     + [OTLP/gRPC Throttling](#otlpgrpc-throttling)
     + [OTLP/gRPC Service and Protobuf Definitions](#otlpgrpc-service-and-protobuf-definitions)
     + [OTLP/gRPC Default Port](#otlpgrpc-default-port)
   * [OTLP/HTTP](#otlphttp)
     + [OTLP/HTTP Request](#otlphttp-request)
     + [OTLP/HTTP Response](#otlphttp-response)
-      - [Success](#success)
-      - [Failures](#failures)
+      - [Success](#success-1)
+      - [Partial Success](#partial-success-1)
+      - [Failures](#failures-1)
       - [Bad Data](#bad-data)
       - [OTLP/HTTP Throttling](#otlphttp-throttling)
       - [All Other Responses](#all-other-responses)
@@ -35,7 +39,7 @@ nodes such as collectors and telemetry backends.
 - [Known Limitations](#known-limitations)
   * [Request Acknowledgements](#request-acknowledgements)
     + [Duplicate Data](#duplicate-data)
-  * [Partial Success](#partial-success)
+    + [Partial Success Retry](#partial-success-retry)
 - [Future Versions and Interoperability](#future-versions-and-interoperability)
 - [Glossary](#glossary)
 - [References](#references)
@@ -145,16 +149,55 @@ was not delivered.
 
 #### OTLP/gRPC Response
 
-The server may respond with either a success or an error to the requests.
+The response MUST be the appropriate serialized Protobuf message (see below for
+the specific message to use in the [Success](#success),
+[Partial Success](#partial-success) and [Failure](#failures) cases).
+
+##### Success
+
+The success response indicates telemetry data is successfully accepted by the
+server.
 
-The success response indicates telemetry data is successfully processed by the
-server. If the server receives an empty request (a request that does not carry
+If the server receives an empty request (a request that does not carry
 any telemetry data) the server SHOULD respond with success.
 
-Success response is returned via
-[Export*ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
-message (`ExportTraceServiceResponse` for traces, `ExportMetricsServiceResponse`
-for metrics, `ExportLogsServiceResponse` for logs).
+On success, the server response MUST be a Protobuf-encoded
+[Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
+message (`ExportTraceServiceResponse` for traces,
+`ExportMetricsServiceResponse` for metrics and
+`ExportLogsServiceResponse` for logs).
+
+The server MUST leave the `partial_success` field unset
+in case of a successful response.
+
+##### Partial Success
+
+If the request is only partially accepted
+(i.e. when the server accepts only parts of the data and rejects the rest), the
+server response MUST be a Protobuf-encoded
+[Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
+message (`ExportTraceServiceResponse` for traces,
+`ExportMetricsServiceResponse` for metrics and
+`ExportLogsServiceResponse` for logs).
+
+Additionally, the server MUST initialize the `partial_success` field
+(`ExportTracePartialSuccess` message for traces,
+`ExportMetricsPartialSuccess` message for metrics and
+`ExportLogsPartialSuccess` message for logs), and it MUST set the respective
+`accepted_spans`, `accepted_data_points` or `accepted_log_records` field with
+the number of spans/data points/log records it accepted. In case the server
+rejected everything, the `accepted_<signal>` field MUST be set to `0`.
+
+The server SHOULD populate the `error_message` field with a human-readable
+error message in English. The message should explain why the
+server rejected parts of the data, and might offer guidance on how users
+can address the issues.
+The protocol does not attempt to define the structure of the error message.
+
+The client MUST NOT retry the request when it receives a partial success
+response where the `partial_success` is populated.
+
+##### Failures
 
 When an error is returned by the server it falls into 2 broad categories:
 retryable and not-retryable:
@@ -382,8 +425,9 @@ numbers or strings are accepted when decoding.
 
 #### OTLP/HTTP Response
 
-Response body MUST be the appropriate serialized Protobuf message (see below for
-the specific message to use in the Success and Failure cases).
+The response body MUST be the appropriate serialized Protobuf message (see below for
+the specific message to use in the [Success](#success-1),
+[Partial Success](#partial-success-1) and [Failure](#failures-1) cases).
 
 The server MUST set "Content-Type: application/x-protobuf" header if the
 response body is binary-encoded Protobuf payload. The server MUST set
@@ -397,13 +441,47 @@ header.
 
 ##### Success
 
-On success the server MUST respond with `HTTP 200 OK`. Response body MUST be
-Protobuf-encoded `ExportTraceServiceResponse` message for traces,
-`ExportMetricsServiceResponse` message for metrics and
-`ExportLogsServiceResponse` message for logs.
+The success response indicates telemetry data is successfully accepted by the
+server.
+
+If the server receives an empty request (a request that does not carry
+any telemetry data) the server SHOULD respond with success.
 
-The server SHOULD respond with success no sooner than after successfully
-decoding and validating the request.
+On success, the server MUST respond with `HTTP 200 OK`. The response body MUST be
+a Protobuf-encoded [Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
+message (`ExportTraceServiceResponse` for traces,
+`ExportMetricsServiceResponse` for metrics and
+`ExportLogsServiceResponse` for logs).
+
+The server MUST leave the `partial_success` field unset
+in case of a successful response.
+
+##### Partial Success
+
+If the request is only partially accepted
+(i.e. when the server accepts only parts of the data and rejects the rest), the
+server MUST respond with `HTTP 200 OK`. The response body MUST be
+a Protobuf-encoded [Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto)
+message (`ExportTraceServiceResponse` for traces,
+`ExportMetricsServiceResponse` for metrics and
+`ExportLogsServiceResponse` for logs).
+
+Additionally, the server MUST initialize the `partial_success` field
+(`ExportTracePartialSuccess` message for traces,
+`ExportMetricsPartialSuccess` message for metrics and
+`ExportLogsPartialSuccess` message for logs), and it MUST set the respective
+`accepted_spans`, `accepted_data_points` or `accepted_log_records` field with
+the number of spans/data points/log records it accepted. In case the server
+rejected everything, the `accepted_<signal>` field MUST be set to `0`.
+
+The server MAY populate the `error_message` field with a human-readable
+error message in English. The message should explain why the
+server rejected parts of the data, and might offer guidance on how users
+can address the issues.
+The protocol does not attempt to define the structure of the error message.
+
+The client MUST NOT retry the request when it receives a partial success
+response where the `partial_success` is populated.
 
 ##### Failures
 
@@ -520,11 +598,21 @@ received yet. The client will typically choose to re-send such data to guarantee
 delivery, which may result in duplicate data on the server side. This is a
 deliberate choice and is considered to be the right tradeoff for telemetry data.
 
-### Partial Success
+#### Partial Success Retry
+
+Each server has its particularities and its way of treating data. There
+can be many reasons why a given server only accepted parts of an
+OTLP request (e.g. quota exceeded,
+data not conforming with the server's standards, etc).
+
+The protocol offers a basic way of communicating partial reception success
+from the server to the client. Such partial success information contains
+how many spans/data points/log records were accepted and a general error
+message. With only such information it is not feasible to achieve any type
+of automatic retry by clients.
 
-The protocol does not attempt to communicate partial reception success from the
-server to the client (i.e. when part of the data can be received by the server
-and part of it cannot). Attempting to do so would complicate the protocol and
+The protocol does not give guidance on how such partial success requests can be
+retried by clients. Attempting to do so would complicate the protocol and
 implementations significantly and is left out as a possible future area of work.
 
 ## Future Versions and Interoperability