Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include OTLP proto version identifier in requests #204

Closed

Conversation

MikeGoldsmith
Copy link
Member

@MikeGoldsmith MikeGoldsmith commented May 20, 2022

Adds OTLP request identifier to describe the OTLP version used to create the binary or JSON payload.

@MikeGoldsmith MikeGoldsmith requested a review from a team May 20, 2022 11:27

Currently releaed SDKs and collector versions will not include these identifiers.

## Prior art and alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously people asked for the version number in the payload a few times. Can you consider this alternate and why in the header and not in the payload (or vice versa)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add considered request body resource attribute in alternatives. Headers offer more flexibility in routing the request as they body does not need to be decoded to get the value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mean a resource attribute, but a field in OTLP proto Export*ServiceRequest/*Data message.
You are right that the downside is that it requires decoding the message. The upside is that it works for other mediums where no concept of headers exists, e.g. OTLP data stored in a file (as binary proto or JSON).
If we need version for these mediums too (and we likely do for exactly the same reasons as we need them for grpc or http) then we will have 2 ways to record the version, which is a big downside to me, probably bigger than the need to decode the message.

Copy link
Member Author

@MikeGoldsmith MikeGoldsmith May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tigrannajaryan - sorry for delay in replying.

Encoding the OTLP version within the payload doesn't feel like a good solution as we'd be forcing consumers who do care about the encoding to decode twice, first to get the OTLP version and again to decode using the appropriate OTLP handler. OTLP request object can be large and it's very inefficient the whole rquest to potentially throw away. Not only that, many gRPC handler frameworks don't even offer you the raw binary payload and handle the decoding for you.

For the use case of OTLP requests that have been serialised to disk (eg binary / JSON), do we have some way to describe the metadata associated to the request payload when it was received?

For example the structure could be something like this in JSON:

{
  "metadata": {
    "otlp-version": "0.17.0",
  },
  "request": {
    # JSON for request
  }

Is there a stable format request data to use when serialising to disk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forcing consumers who do care about the encoding to decode twice, first to get the OTLP version and again to decode using the appropriate OTLP handler.

I don't think this is true. We should never make such changes to OTLP protobufs that make decoding (umarshalling) dependant on the version number. You should always be able to unmarshal once, if necessary look at the version number field and use that version number as a guide for interpreting the other unmarshalled fields. However, the version number should never be used for deciding how to perform the unmarshaling.

If we are ever in a situation when unmarshalling depends on the version number then it means a new major version number of OTLP (OTLPv2), which is no longer compatible with the current OTLP. The bar for doing this is extremely high and it is not what we want the version number field to help solve.

I want to make sure we are very careful with this version number proposal. I feel that we are giving it more and more responsibility, while allowing us off the hook for maintaining the compatibility of OTLP. This is a dangerous path. Our mindset should be that OTLP must stay backwards compatible and version number may be used to help us in the situation when we made a (very rare) mistake when evolving the proto definitions and the correct interpretation of unmarshalled data requires an additional hint that the version number provides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - the original purpose of the OTLP version being available was only to be used for informational reasons (eg identify users who send which versions and promote them to upgrade as appropriate).

However, we are in a situation where there have seen some breaking changes in the 0.x versions of the proto definitions, particularly around the fairly aggressive removing of deprecated types / fields. A good example of this is the deprecated Int/Float metrics that were removed in proto version 0.12 and new fields has been added to ScopeXXX in 0.17. If deprecated fields had been left in the proto files until the 1.0 release, they likely would not have caused this problem.

I also believe the use-case originally shared where some receivers wish to route requests based on the OTLP version is still valid and would be best served via a gRPC metadata entry or HTTP header. This would be possible if the OTLP version was part of the request payload, but would require the routing process to decode the whole body to get the version and potentially route to handler which would then decode again.

Why do you feel having the version both in metadata/header and request body field be an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also believe the use-case originally shared where some receivers wish to route requests based on the OTLP version is still valid and would be best served via a gRPC metadata entry or HTTP header. This would be possible if the OTLP version was part of the request payload, but would require the routing process to decode the whole body to get the version and potentially route to handler which would then decode again.

In practice, what is the use case for such routing? Is there really a situation when there is a need to route to different places just based on a version number difference?

Why do you feel having the version both in metadata/header and request body field be an issue?

I don't like recording the same information in 2 places, with the potential of the data becoming out of sync and causing further confusion and problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for routing is exactly for the reason above, there have been breaking changes in the OTLP spec and some receivers may wish to continue to support multiple versions of OTLP with different handlers.

While the intention may be for the OTLP spec to always be backward compatible, an identifier outside of the payload to indicate how to decode is useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow-up on this;

I think the OTLP version should be treated as metadata for the given request and should be stored separately from the payload data. For example, if you're storing requests for playback at a later time, you'd likely want to record other useful information such as the sender (eg user agent header) and the time it was received. I think the OTLP version also falls into the category of useful info, that helps you understand and process the request payload.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrannajaryan @jmacd @bogdandrutu Any further feedback on this?

text/0000-otlp-proto-version-identifier.md Outdated Show resolved Hide resolved
Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am blocking this OTEP since I feel that there is a misunderstanding about what the version number field is and how it should be used.

Please update the OTEP to clarify it.


## Explanation

Currently it is not possible to know the version of the OTLP proto definition before receiving it. This is challenging because there have been changes to the OTLP proto specification that are breaking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this phrasing indirectly normalizes the fact that there may be breaking changes, which may contribute to our sloppiness and indirectly cause breakage. I think we should do the opposite and emphasize even stronger that breaking changes are extremely bad and we should never do them without officially introducing a new major OTLP version (in which case the version number outside the payload may be introduced).

My opinion is still that this OTEP and the version number outside the payload moves us in the wrong direction.
Instead I think we should make a bigger effort to avoid breaking changes, for example by adding tooling that catches it, by making it clearer in our spec/docs what changes constitute breaking changes and should avoided, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and fully appreciate the intention to avoid breaking changes to the wire format. However, we've just seen another breaking change be merged into the OTLP spec.

Removing deprecated fields like these forces OTLP receivers to chose between either supporting senders who have not upgraded to a newer version of the OTLP spec (<0.12) or supporting senders who have upgraded(0.19+). Not everyone has the luxury of dictating when senders upgrade. These deprecated fields could have existed in the spec until a major release (eg 1.0) and receivers can choose to support them if they wish to.

A solution to the above is for receivers to utilise multiple handlers using different OTLP formats. However, as we've discussed previous, it's cumbersome and inefficient to decode a request payload to get the version to then pass onto one of these handlers which will likely decode again. An identifier outside of the payload allows receivers to decide how they handle the request before using resources on decoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a "Why not both?" solution.

We should certainly strive for all the bigger, ideal efforts @tigrannajaryan describes: avoid proto breakage, more tooling to detect breakage. If we also include the proto version in the request and outside of the body (an x-otlp-version header? otlp/1.2.3 included in exporter user_agent?), receivers of OTLP have easy access to the data necessary to troubleshoot communications problems when we haven't lived up to that ideal.

@tedsuo tedsuo added the triaged label Jan 30, 2023
@tedsuo tedsuo added the stale This issue or PR is stale and will be closed soon unless it is resurrected by the author. label Jul 24, 2023
@tedsuo
Copy link
Contributor

tedsuo commented Jul 24, 2023

@MikeGoldsmith we are marking OTEPs which are no longer being updated as stale. We will close this PR in one week. If you wish to resurrect this PR, that is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This issue or PR is stale and will be closed soon unless it is resurrected by the author. triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants