-
Notifications
You must be signed in to change notification settings - Fork 165
Include OTLP proto version identifier in requests #204
Include OTLP proto version identifier in requests #204
Conversation
|
||
Currently releaed SDKs and collector versions will not include these identifiers. | ||
|
||
## Prior art and alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously people asked for the version number in the payload a few times. Can you consider this alternate and why in the header and not in the payload (or vice versa)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add considered request body resource attribute in alternatives. Headers offer more flexibility in routing the request as they body does not need to be decoded to get the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not mean a resource attribute, but a field in OTLP proto Export*ServiceRequest
/*Data
message.
You are right that the downside is that it requires decoding the message. The upside is that it works for other mediums where no concept of headers exists, e.g. OTLP data stored in a file (as binary proto or JSON).
If we need version for these mediums too (and we likely do for exactly the same reasons as we need them for grpc or http) then we will have 2 ways to record the version, which is a big downside to me, probably bigger than the need to decode the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tigrannajaryan - sorry for delay in replying.
Encoding the OTLP version within the payload doesn't feel like a good solution as we'd be forcing consumers who do care about the encoding to decode twice, first to get the OTLP version and again to decode using the appropriate OTLP handler. OTLP request object can be large and it's very inefficient the whole rquest to potentially throw away. Not only that, many gRPC handler frameworks don't even offer you the raw binary payload and handle the decoding for you.
For the use case of OTLP requests that have been serialised to disk (eg binary / JSON), do we have some way to describe the metadata associated to the request payload when it was received?
For example the structure could be something like this in JSON:
{
"metadata": {
"otlp-version": "0.17.0",
},
"request": {
# JSON for request
}
Is there a stable format request data to use when serialising to disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forcing consumers who do care about the encoding to decode twice, first to get the OTLP version and again to decode using the appropriate OTLP handler.
I don't think this is true. We should never make such changes to OTLP protobufs that make decoding (umarshalling) dependant on the version number. You should always be able to unmarshal once, if necessary look at the version number field and use that version number as a guide for interpreting the other unmarshalled fields. However, the version number should never be used for deciding how to perform the unmarshaling.
If we are ever in a situation when unmarshalling depends on the version number then it means a new major version number of OTLP (OTLPv2), which is no longer compatible with the current OTLP. The bar for doing this is extremely high and it is not what we want the version number field to help solve.
I want to make sure we are very careful with this version number proposal. I feel that we are giving it more and more responsibility, while allowing us off the hook for maintaining the compatibility of OTLP. This is a dangerous path. Our mindset should be that OTLP must stay backwards compatible and version number may be used to help us in the situation when we made a (very rare) mistake when evolving the proto definitions and the correct interpretation of unmarshalled data requires an additional hint that the version number provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - the original purpose of the OTLP version being available was only to be used for informational reasons (eg identify users who send which versions and promote them to upgrade as appropriate).
However, we are in a situation where there have seen some breaking changes in the 0.x versions of the proto definitions, particularly around the fairly aggressive removing of deprecated types / fields. A good example of this is the deprecated Int/Float metrics that were removed in proto version 0.12 and new fields has been added to ScopeXXX in 0.17. If deprecated fields had been left in the proto files until the 1.0 release, they likely would not have caused this problem.
I also believe the use-case originally shared where some receivers wish to route requests based on the OTLP version is still valid and would be best served via a gRPC metadata entry or HTTP header. This would be possible if the OTLP version was part of the request payload, but would require the routing process to decode the whole body to get the version and potentially route to handler which would then decode again.
Why do you feel having the version both in metadata/header and request body field be an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also believe the use-case originally shared where some receivers wish to route requests based on the OTLP version is still valid and would be best served via a gRPC metadata entry or HTTP header. This would be possible if the OTLP version was part of the request payload, but would require the routing process to decode the whole body to get the version and potentially route to handler which would then decode again.
In practice, what is the use case for such routing? Is there really a situation when there is a need to route to different places just based on a version number difference?
Why do you feel having the version both in metadata/header and request body field be an issue?
I don't like recording the same information in 2 places, with the potential of the data becoming out of sync and causing further confusion and problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for routing is exactly for the reason above, there have been breaking changes in the OTLP spec and some receivers may wish to continue to support multiple versions of OTLP with different handlers.
While the intention may be for the OTLP spec to always be backward compatible, an identifier outside of the payload to indicate how to decode is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow-up on this;
I think the OTLP version should be treated as metadata for the given request and should be stored separately from the payload data. For example, if you're storing requests for playback at a later time, you'd likely want to record other useful information such as the sender (eg user agent header) and the time it was received. I think the OTLP version also falls into the category of useful info, that helps you understand and process the request payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan @jmacd @bogdandrutu Any further feedback on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am blocking this OTEP since I feel that there is a misunderstanding about what the version number field is and how it should be used.
Please update the OTEP to clarify it.
|
||
## Explanation | ||
|
||
Currently it is not possible to know the version of the OTLP proto definition before receiving it. This is challenging because there have been changes to the OTLP proto specification that are breaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this phrasing indirectly normalizes the fact that there may be breaking changes, which may contribute to our sloppiness and indirectly cause breakage. I think we should do the opposite and emphasize even stronger that breaking changes are extremely bad and we should never do them without officially introducing a new major OTLP version (in which case the version number outside the payload may be introduced).
My opinion is still that this OTEP and the version number outside the payload moves us in the wrong direction.
Instead I think we should make a bigger effort to avoid breaking changes, for example by adding tooling that catches it, by making it clearer in our spec/docs what changes constitute breaking changes and should avoided, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and fully appreciate the intention to avoid breaking changes to the wire format. However, we've just seen another breaking change be merged into the OTLP spec.
Removing deprecated fields like these forces OTLP receivers to chose between either supporting senders who have not upgraded to a newer version of the OTLP spec (<0.12) or supporting senders who have upgraded(0.19+). Not everyone has the luxury of dictating when senders upgrade. These deprecated fields could have existed in the spec until a major release (eg 1.0) and receivers can choose to support them if they wish to.
A solution to the above is for receivers to utilise multiple handlers using different OTLP formats. However, as we've discussed previous, it's cumbersome and inefficient to decode a request payload to get the version to then pass onto one of these handlers which will likely decode again. An identifier outside of the payload allows receivers to decide how they handle the request before using resources on decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a "Why not both?" solution.
We should certainly strive for all the bigger, ideal efforts @tigrannajaryan describes: avoid proto breakage, more tooling to detect breakage. If we also include the proto version in the request and outside of the body (an x-otlp-version
header? otlp/1.2.3
included in exporter user_agent?), receivers of OTLP have easy access to the data necessary to troubleshoot communications problems when we haven't lived up to that ideal.
@MikeGoldsmith we are marking OTEPs which are no longer being updated as |
Adds OTLP request identifier to describe the OTLP version used to create the binary or JSON payload.