-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for partial success in an OTLP export response #2636
Closed
joaopgrassi
wants to merge
10
commits into
open-telemetry:main
from
dynatrace-oss-contrib:feature/otlp-partial-success
Closed
Changes from 9 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
eb6de85
Add partial success response guidance
joaopgrassi 2f86f18
Add changelog entry
joaopgrassi 399584a
Make a "total failure" case clear during a partial success
joaopgrassi a128cb2
Merge branch 'main' into feature/otlp-partial-success
joaopgrassi 57734b1
PR suggestions
joaopgrassi 75c4bf1
Changing wording from processing to accepted
joaopgrassi 512764a
Update specification/protocol/otlp.md
joaopgrassi 26ab918
Merge branch 'main' into feature/otlp-partial-success
joaopgrassi d9482df
Remove redundant sentence
joaopgrassi cad29ed
Merge branch 'main' into feature/otlp-partial-success
joaopgrassi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,14 +16,18 @@ nodes such as collectors and telemetry backends. | |
* [OTLP/gRPC](#otlpgrpc) | ||
+ [OTLP/gRPC Concurrent Requests](#otlpgrpc-concurrent-requests) | ||
+ [OTLP/gRPC Response](#otlpgrpc-response) | ||
- [Success](#success) | ||
- [Partial Success](#partial-success) | ||
- [Failures](#failures) | ||
+ [OTLP/gRPC Throttling](#otlpgrpc-throttling) | ||
+ [OTLP/gRPC Service and Protobuf Definitions](#otlpgrpc-service-and-protobuf-definitions) | ||
+ [OTLP/gRPC Default Port](#otlpgrpc-default-port) | ||
* [OTLP/HTTP](#otlphttp) | ||
+ [OTLP/HTTP Request](#otlphttp-request) | ||
+ [OTLP/HTTP Response](#otlphttp-response) | ||
- [Success](#success) | ||
- [Failures](#failures) | ||
- [Success](#success-1) | ||
- [Partial Success](#partial-success-1) | ||
- [Failures](#failures-1) | ||
- [Bad Data](#bad-data) | ||
- [OTLP/HTTP Throttling](#otlphttp-throttling) | ||
- [All Other Responses](#all-other-responses) | ||
|
@@ -35,7 +39,7 @@ nodes such as collectors and telemetry backends. | |
- [Known Limitations](#known-limitations) | ||
* [Request Acknowledgements](#request-acknowledgements) | ||
+ [Duplicate Data](#duplicate-data) | ||
* [Partial Success](#partial-success) | ||
+ [Partial Success Retry](#partial-success-retry) | ||
- [Future Versions and Interoperability](#future-versions-and-interoperability) | ||
- [Glossary](#glossary) | ||
- [References](#references) | ||
|
@@ -145,16 +149,55 @@ was not delivered. | |
|
||
#### OTLP/gRPC Response | ||
|
||
The server may respond with either a success or an error to the requests. | ||
The response MUST be the appropriate serialized Protobuf message (see below for | ||
the specific message to use in the [Success](#success), | ||
[Partial Success](#partial-success) and [Failure](#failures) cases). | ||
|
||
##### Success | ||
|
||
The success response indicates telemetry data is successfully accepted by the | ||
server. | ||
|
||
The success response indicates telemetry data is successfully processed by the | ||
server. If the server receives an empty request (a request that does not carry | ||
If the server receives an empty request (a request that does not carry | ||
any telemetry data) the server SHOULD respond with success. | ||
|
||
Success response is returned via | ||
[Export*ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto) | ||
message (`ExportTraceServiceResponse` for traces, `ExportMetricsServiceResponse` | ||
for metrics, `ExportLogsServiceResponse` for logs). | ||
On success, the server response MUST be a Protobuf-encoded | ||
[Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto) | ||
message (`ExportTraceServiceResponse` for traces, | ||
`ExportMetricsServiceResponse` for metrics and | ||
`ExportLogsServiceResponse` for logs). | ||
|
||
The server MUST leave the `partial_success` field unset | ||
in case of a successful response. | ||
|
||
##### Partial Success | ||
|
||
If the request is only partially accepted | ||
(i.e. when the server accepts only parts of the data and rejects the rest), the | ||
server response MUST be a Protobuf-encoded | ||
[Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto) | ||
message (`ExportTraceServiceResponse` for traces, | ||
`ExportMetricsServiceResponse` for metrics and | ||
`ExportLogsServiceResponse` for logs). | ||
|
||
Additionally, the server MUST initialize the `partial_success` field | ||
(`ExportTracePartialSuccess` message for traces, | ||
`ExportMetricsPartialSuccess` message for metrics and | ||
`ExportLogsPartialSuccess` message for logs), and it MUST set the respective | ||
`accepted_spans`, `accepted_data_points` or `accepted_log_records` field with | ||
the number of spans/data points/log records it accepted. In case the server | ||
rejected everything, the `accepted_<signal>` field MUST be set to `0`. | ||
|
||
The server SHOULD populate the `error_message` field with a human-readable | ||
error message in English. The message should explain why the | ||
server rejected parts of the data, and might offer guidance on how users | ||
can address the issues. | ||
The protocol does not attempt to define the structure of the error message. | ||
|
||
The client MUST NOT retry the request when it receives a partial success | ||
response where the `partial_success` is populated. | ||
|
||
##### Failures | ||
|
||
When an error is returned by the server it falls into 2 broad categories: | ||
retryable and not-retryable: | ||
|
@@ -382,8 +425,9 @@ numbers or strings are accepted when decoding. | |
|
||
#### OTLP/HTTP Response | ||
|
||
Response body MUST be the appropriate serialized Protobuf message (see below for | ||
the specific message to use in the Success and Failure cases). | ||
The response body MUST be the appropriate serialized Protobuf message (see below for | ||
the specific message to use in the [Success](#success-1), | ||
[Partial Success](#partial-success-1) and [Failure](#failures-1) cases). | ||
|
||
The server MUST set "Content-Type: application/x-protobuf" header if the | ||
response body is binary-encoded Protobuf payload. The server MUST set | ||
|
@@ -397,13 +441,47 @@ header. | |
|
||
##### Success | ||
|
||
On success the server MUST respond with `HTTP 200 OK`. Response body MUST be | ||
Protobuf-encoded `ExportTraceServiceResponse` message for traces, | ||
`ExportMetricsServiceResponse` message for metrics and | ||
`ExportLogsServiceResponse` message for logs. | ||
The success response indicates telemetry data is successfully accepted by the | ||
server. | ||
|
||
If the server receives an empty request (a request that does not carry | ||
any telemetry data) the server SHOULD respond with success. | ||
|
||
The server SHOULD respond with success no sooner than after successfully | ||
decoding and validating the request. | ||
On success, the server MUST respond with `HTTP 200 OK`. The response body MUST be | ||
a Protobuf-encoded [Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto) | ||
message (`ExportTraceServiceResponse` for traces, | ||
`ExportMetricsServiceResponse` for metrics and | ||
`ExportLogsServiceResponse` for logs). | ||
|
||
The server MUST leave the `partial_success` field unset | ||
in case of a successful response. | ||
|
||
##### Partial Success | ||
|
||
If the request is only partially accepted | ||
(i.e. when the server accepts only parts of the data and rejects the rest), the | ||
server MUST respond with `HTTP 200 OK`. The response body MUST be | ||
a Protobuf-encoded [Export<signal>ServiceResponse](https://github.com/open-telemetry/opentelemetry-proto) | ||
message (`ExportTraceServiceResponse` for traces, | ||
`ExportMetricsServiceResponse` for metrics and | ||
`ExportLogsServiceResponse` for logs). | ||
|
||
Additionally, the server MUST initialize the `partial_success` field | ||
(`ExportTracePartialSuccess` message for traces, | ||
`ExportMetricsPartialSuccess` message for metrics and | ||
`ExportLogsPartialSuccess` message for logs), and it MUST set the respective | ||
`accepted_spans`, `accepted_data_points` or `accepted_log_records` field with | ||
the number of spans/data points/log records it accepted. In case the server | ||
rejected everything, the `accepted_<signal>` field MUST be set to `0`. | ||
|
||
The server MAY populate the `error_message` field with a human-readable | ||
error message in English. The message should explain why the | ||
server rejected parts of the data, and might offer guidance on how users | ||
can address the issues. | ||
The protocol does not attempt to define the structure of the error message. | ||
|
||
The client MUST NOT retry the request when it receives a partial success | ||
response where the `partial_success` is populated. | ||
|
||
##### Failures | ||
|
||
|
@@ -520,11 +598,21 @@ received yet. The client will typically choose to re-send such data to guarantee | |
delivery, which may result in duplicate data on the server side. This is a | ||
deliberate choice and is considered to be the right tradeoff for telemetry data. | ||
|
||
### Partial Success | ||
#### Partial Success Retry | ||
|
||
Each server has its particularities and its way of treating data. There | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be a bit verbose. I think we should specify more explicitly here:
The rest of this commentary is nice for WHY, but the WHAT should be called out explicitly. |
||
can be many reasons why a given server only accepted parts of an | ||
OTLP request (e.g. quota exceeded, | ||
data not conforming with the server's standards, etc). | ||
|
||
The protocol offers a basic way of communicating partial reception success | ||
from the server to the client. Such partial success information contains | ||
how many spans/data points/log records were accepted and a general error | ||
message. With only such information it is not feasible to achieve any type | ||
of automatic retry by clients. | ||
|
||
The protocol does not attempt to communicate partial reception success from the | ||
server to the client (i.e. when part of the data can be received by the server | ||
and part of it cannot). Attempting to do so would complicate the protocol and | ||
The protocol does not give guidance on how such partial success requests can be | ||
retried by clients. Attempting to do so would complicate the protocol and | ||
implementations significantly and is left out as a possible future area of work. | ||
|
||
## Future Versions and Interoperability | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we repeating such a large body of text that is mostly identical to OTLP/gRPC? This constitutes the semantic definition of the protocol, which is independent of the encoding / transport used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same 😕. I had an idea about putting the "common" things in a separated section but I was concerned the PR would touch too many things and it would be harder to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few nuances, for ex, for HTTP, we indicate which
Content-Type
should be used for the response, compression, which status code to return. Such things don't apply to gRPC.. so I was not sure what makes sense. Having a combined section and doing "If HTTP, return xyz header" doesn't sound good. Another point is if someone is implementing a gRPC server, they don't have to read the HTTP stuff and vice-versa.I'm open to improve it, but I'm not sure to what extent. It could also be a follow up to make the review easier? Open to suggestions :)