Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default protocol for OTLP exporter selection SDK env #1885

Closed
srikanthccv opened this issue Aug 23, 2021 · 26 comments · Fixed by #1969
Closed

Default protocol for OTLP exporter selection SDK env #1885

srikanthccv opened this issue Aug 23, 2021 · 26 comments · Fixed by #1969
Assignees
Labels
area:configuration Related to configuring the SDK area:sdk Related to the SDK spec:protocol Related to the specification/protocol directory

Comments

@srikanthccv
Copy link
Member

What are you trying to achieve?

Exporter selection https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/sdk-environment-variables.md#exporter-selection
part of spec has default for jaeger and zipkin but there is no default for otlp. It is unclear whether to use otlp/grpc or otlp/http for the client libraries that have support for both.

@srikanthccv srikanthccv added the spec:trace Related to the specification/trace directory label Aug 23, 2021
@tigrannajaryan tigrannajaryan added the spec:protocol Related to the specification/protocol directory label Sep 15, 2021
@tigrannajaryan
Copy link
Member

OTLP/HTTP is slightly more CPU efficient than OTLP/gRCP so it may be preferable. However, it may be that most SDKs implement OTLP/gRCP already, but not OTLP/HTTP so making OTLP/HTTP may not work.

This likely needs a discussion in Maintainers meeting. Adding to the next week's agenda.

@tigrannajaryan tigrannajaryan removed the spec:trace Related to the specification/trace directory label Sep 15, 2021
@mtwo
Copy link
Member

mtwo commented Sep 20, 2021

From the maintainers call: general agreement that gRPC should be the default on each artifact, unless the HTTP implementation is substantially better on that artifact. Most maintainers in the call were already following this pattern.

@tigrannajaryan
Copy link
Member

Here is some informal performance data for Go implementation (benchmarks an OTLP client+server):

32000 nano batches, 1 spans per batch, 10 attrs per span
OTLP/HTTP1.1/1                 32000 spans|  6.2 cpusec|  4.7 wallsec| 5144.7 batches/cpusec|  6832.8 batches/wallsec|194.4 cpuμs/span
OTLP/GRPC-Unary/Sequential     32000 spans| 13.2 cpusec|  6.0 wallsec| 2418.7 batches/cpusec|  5370.4 batches/wallsec|413.4 cpuμs/span

8000 tiny batches, 10 spans per batch, 10 attrs per span
OTLP/HTTP1.1/1                 80000 spans|  4.4 cpusec|  3.8 wallsec| 1830.7 batches/cpusec|  2097.9 batches/wallsec| 54.6 cpuμs/span
OTLP/GRPC-Unary/Sequential     80000 spans|  6.0 cpusec|  4.1 wallsec| 1322.3 batches/cpusec|  1932.3 batches/wallsec| 75.6 cpuμs/span

4000 small batches, 100 spans per batch, 10 attrs per span
OTLP/HTTP1.1/1                400000 spans| 16.6 cpusec| 16.0 wallsec|  241.5 batches/cpusec|   250.5 batches/wallsec| 41.4 cpuμs/span
OTLP/GRPC-Unary/Sequential    400000 spans| 17.5 cpusec| 15.4 wallsec|  228.2 batches/cpusec|   259.3 batches/wallsec| 43.8 cpuμs/span

400 large batches, 500 spans per batch, 10 attrs per span
OTLP/HTTP1.1/1                200000 spans|  7.2 cpusec|  7.1 wallsec|   55.2 batches/cpusec|    56.5 batches/wallsec| 36.2 cpuμs/span
OTLP/GRPC-Unary/Sequential    200000 spans|  7.3 cpusec|  6.9 wallsec|   55.1 batches/cpusec|    57.7 batches/wallsec| 36.3 cpuμs/span

50 very large batches, 1000 spans per batch, 10 attrs per span
OTLP/HTTP1.1/1                 50000 spans|  1.8 cpusec|  1.7 wallsec|   27.9 batches/cpusec|    28.6 batches/wallsec| 35.8 cpuμs/span
OTLP/GRPC-Unary/Sequential     50000 spans|  1.8 cpusec|  1.7 wallsec|   28.6 batches/cpusec|    29.4 batches/wallsec| 35.0 cpuμs/span

OTLP/HTTP is about 2x more CPU efficient for batches that contain a single span. For larger batches (>=100 spans) the difference between OTLP/HTTP and OTLP/gRPC is insignificant.

I am fine with making/keeping gRPC the default given that it is already what we do unless someone feels very strong about the performance impact for small batches (and keep in mind that in other languages the benchmarks may show a different result, this is just for Go).

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Sep 27, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Sep 27, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Sep 27, 2021
@tigrannajaryan
Copy link
Member

It seems like supporting OTLP/gRPC in Ruby is not straightfoward. @fbogsany please comment here.

I am personally slightly more favour of OTLP/HTTP but it seems like other languages preferred gRPC.

If we believe it is more complicated (and undesirable) to make OTLP/gRPC the default for Ruby then we either make OTLP/HTTP the default and required for al languages (I would prefer this) or allow each language to choose a default (this is a bit less desirable since we loose uniformness and require receivers to support both transports to be able to receive from variety of Otel SDKs).

@fbogsany
Copy link
Contributor

Copied from #1969 (comment)

gRPC is less commonly used in the Ruby ecosystem than in other languages and Ruby appears to be a lower priority than other languages for gRPC maintainers. This means that we have much less experience with gRPC in the Ruby SIG and little demand for it so far. Battle scars from spectacular Ruby + gRPC production failures also make some of us quite wary of the support burden.

We have had considerable success with OTLP/HTTP in OpenTelemetry Ruby. Is there a strong reason to force gRPC as a default transport rather than leaving the default up to individual SIGs? Similarly, is there a strong reason to require gRPC as a transport? Both of these are challenging for the Ruby SIG.

@tigrannajaryan
Copy link
Member

From the maintainers call: general agreement that gRPC should be the default on each artifact, unless the HTTP implementation is substantially better on that artifact. Most maintainers in the call were already following this pattern.

@mtwo (and anybody else who was in the Maintainers call) can you please list the arguments that the Maintainers had in favour of OTLP/gRPC as the default?

There is evidence that it is difficult to implement (Ruby) and performs worse that OTLP/HTTP (see my comment).

@Oberon00
Copy link
Member

From a theoretical perspective, gRPC needs a HTTP/2 library underneath, so technically you should be able to implement OTLP/HTTP at least everywhere where you can implement OTLP/gRPC. The opposite is not true, since there might not be a (good) gRPC implementation for your target language/framework/version.

Is there any reason why gRPC should be preferred over HTTP as a minimal requirement?

@Oberon00
Copy link
Member

Oberon00 commented Sep 28, 2021

From the compliance matrix, it seems that C++, Rust, Swift and .NET have implemented gRPC but not HTTP. For .NET there is a open PR already (open-telemetry/opentelemetry-dotnet#2316). I don't know about the other languages. I don't expect that implementing the HTTP export would be an obstacle though, it's probably a matter of priority why only gRPC is implemented. In doubt we should ask the maintainers of these languages though.
(PHP also has not implemented OTLP/HTTP, but it also doesn't have OTLP/gRPC).

Taking a quick look at Rust, it actually seems to be implemented (for traces only): https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/src/exporter/http.rs and https://github.com/open-telemetry/opentelemetry-rust/blob/e7ecf58e0b0d1eeff213e458e47e2238575826b4/opentelemetry-otlp/src/lib.rs#L253-L261
(CC @TommyCpp @open-telemetry/rust-approvers please update the compliance matrix if that is correct)

C++ also seems to support OTLP/HTTP actually: https://github.com/open-telemetry/opentelemetry-cpp/tree/main/exporters/otlp#configuration-options--otlp-http-exporter- (both JSON and binary according to that documentation) (CC @owent @lalitb @reyang)

That leaves only swift as a language with gRPC support but no HTTP support. On the other hand we have Ruby where we know that supporting gRPC would not be easy while HTTP is supported.

@Oberon00
Copy link
Member

We should also be clear about OTLP/HTTP+protobuf (what I think we are talking about) vs OTLP/HTTP+JSON (what some seem to view as default for OTLP/HTTP)

@TommyCpp
Copy link
Contributor

As far as I remember, we only supported OTLP/HTTP+protobuf but not OTLP/HTTP+JSON. Will update spec accordingly.

@alanwest
Copy link
Member

alanwest commented Sep 28, 2021

For .NET there is a open PR already (open-telemetry/opentelemetry-dotnet#2316).

For .NET, we'd be keen on defaulting to gRPC simply because it would be a breaking change to change it to OTLP/HTTP once this PR lands.

general agreement that gRPC should be the default on each artifact, unless the HTTP implementation is substantially better on that artifact.

If the spec could be flexible and recommend gRPC as default but allow for HTTP as default when it makes sense, I think this would be best. I'm not a JS expert, but my understanding is that gRPC is not a thing in browsers, so this may be another use case where HTTP is preferred (or required).

@tigrannajaryan
Copy link
Member

From a theoretical perspective, gRPC needs a HTTP/2 library underneath, so technically you should be able to implement OTLP/HTTP at least everywhere where you can implement OTLP/gRPC. The opposite is not true, since there might not be a (good) gRPC implementation for your target language/framework/version.

+1. Up until recently if I remember correctly AWS load balancers did not even let gRPC connections through (I think they do now), so there may be real production cases when you simply cannot use gRPC.

Is there any reason why gRPC should be preferred over HTTP as a minimal requirement?

The only reason I can think of is that more SDKs today implement gRPC than HTTP, so it appears making HTTP is more work for Otel maintainers. It does not seem to be a huge deal though, if you have a working OTLP/gRPC exporter, adding OTLP/HTTP should not be very complicated.

@tigrannajaryan
Copy link
Member

We should also be clear about OTLP/HTTP+protobuf (what I think we are talking about) vs OTLP/HTTP+JSON (what some seem to view as default for OTLP/HTTP)

The spec currently strongly hints that OTLP/HTTP+JSON is optional:

SDKs MUST support either grpc or http/protobuf and SHOULD support both. They also MAY support http/json.

@tigrannajaryan
Copy link
Member

From the compliance matrix, it seems that C++, Rust, Swift and .NET have implemented gRPC but not HTTP. For .NET there is a open PR already (open-telemetry/opentelemetry-dotnet#2316). I don't know about the other languages. I don't expect that implementing the HTTP export would be an obstacle though, it's probably a matter of priority why only gRPC is implemented. In doubt we should ask the maintainers of these languages though. (PHP also has not implemented OTLP/HTTP, but it also doesn't have OTLP/gRPC).

Taking a quick look at Rust, it actually seems to be implemented (for traces only): https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/src/exporter/http.rs and https://github.com/open-telemetry/opentelemetry-rust/blob/e7ecf58e0b0d1eeff213e458e47e2238575826b4/opentelemetry-otlp/src/lib.rs#L253-L261 (CC @TommyCpp @open-telemetry/rust-approvers please update the compliance matrix if that is correct)

C++ also seems to support OTLP/HTTP actually: https://github.com/open-telemetry/opentelemetry-cpp/tree/main/exporters/otlp#configuration-options--otlp-http-exporter- (both JSON and binary according to that documentation) (CC @owent @lalitb @reyang)

That leaves only swift as a language with gRPC support but no HTTP support. On the other hand we have Ruby where we know that supporting gRPC would not be easy while HTTP is supported.

@Oberon00 very useful research, thank you. I think this reinforced that overall adding support for OTLP/HTTP where it is missing is not a big effort, while OTLP/gRPC where it is missing looks problematic.

Given this new evidence I now am now more strongly incline towards making OTLP/HTTP the required protocol. Let's hear some strong arguments against if there are any.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Sep 28, 2021

If the spec could be flexible and recommend gRPC as default but allow for HTTP as default when it makes sense, I think this would be best. I'm not a JS expert, but my understanding is that gRPC is not a thing in browsers, so this may be another use case where HTTP is preferred (or required).

We could make OTLP/HTTP support a SHOULD clause, so strongly recommended but still allowed to deviate from. The only problem with this approach is that it makes impossible for OTLP receivers to only support OTLP/HTTP and claim full OTLP compatibility. Perhaps not a big deal, but I would prefer to simply the life of receivers if possible by doing a bit more work on our end.

@anuraaga
Copy link
Contributor

Java's been released for several months now and has always used gRPC as the default for the env variable - it would be extremely disruptive to users to change the default at this point I think. This issue doesn't seem all that different from #1923 and I suspect allowing SIGs to have their default is going to be the most reasonable, it's one of those values that needed to be set early on or never set.

Making OTLP/HTTP the required protocol seems ok though to me - I agree that every language should be able to support HTTP easily if they can support gRPC, while otherwise may not even be able to support gRPC at all. It just leaves the somewhat awkward situation where languages like Java would be defaulting to the non-required protocol. It's not terrible though IMO.

@tigrannajaryan
Copy link
Member

@anuraaga so just to clarify, you suggest that:

  • All SDKs MUST support OTLP/HTTP/Protobuf.
  • OTLP/gRCP becomes a SHOULD requirement.
  • OTLP/HTTP/Json is completely optional for SDKs (and it is not fully spec-ed even yet, so it is considered unstable).
  • SDKs can choose which default they support, it can be one of the 3: OTLP/gRCP, OTLP/HTTP/Protobuf or OTLP/HTTP/JSON (do we allow OTLP/HTTP/JSON to be default since it is not stable yet?).
  • This necessitates receivers to support all 3 if they want to be fully OTLP compatible.

I think this is OK, just a bit more work on the receivers side, but otherwise not a big problem.

@anuraaga
Copy link
Contributor

anuraaga commented Sep 29, 2021

OTLP/HTTP/JSON (do we allow OTLP/HTTP/JSON to be default since it is not stable yet?).
This necessitates receivers to support all 3 if they want to be fully OTLP compatible.

Everything aligns with me except this part - it depends on if JS in browser absolutely requires JSON. My personal experience with grpc-web is binary protobuf (not grpc proper) in browsers is OK - I suspect that browsers that did poorly with binary payloads are not supported by the vast majority of sites now, and only search engines, being core infrastructure, would still target such old ones. I don't expect such code to actually migrate to use OTel in practice, they generally delegate old browsers to old heirloomed code.

If this is not true based on the JS SIG's experience, then that's that since they know best - for a receiver to be fully compatible even with browsers, it would need to support JSON. I think it is worth confirming this is really true in 2021. And if it is, special language for just the browser - so grpc and http are enough to "support OpenTelemetry" but JSON may be needed if that receiver specifically supports browsers - seems appropriate. Forcing support for web browsers on any backend implementing OTel seems like an overreach.

One last note - if we do require receivers to support JSON, please please please require parsing to support reading trace IDs either as hex or base64. Protobuf-json doesn't support hex so it's irresponsible to require implementing the entire parsing logic by hand just because of this, especially given SDKs won't implement this since they are only concerned with export.

For reference, this is how we "parse" for SDK unit tests

https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/otlp/common/src/test/java/io/opentelemetry/exporter/otlp/internal/traces/TraceRequestMarshalerTest.java#L372

Edit: Sorry after rereading my comment I realized it's not "require parsing to support reading trace IDs either as hex or base64" because that is already too hard, the hex path needs to be implemented. IMO it is either 1) Only support base64, just like proto or 2) require SDKs to provide parsing logic so backends can use it and not struggle so much on this deviation from proto3.

@anuraaga
Copy link
Contributor

Forcing support for web browsers on any backend implementing OTel seems like an overreach.

On second thought I'm not so confident in this. It does seem a little excessive, but in the proposed vision doc Compatibility is one of the engineering values - so forcing support for JSON (assuming it is required for browsers) does seem correct.

@tigrannajaryan
Copy link
Member

One last note - if we do require receivers to support JSON, please please please require parsing to support reading trace IDs either as hex or base64. Protobuf-json doesn't support hex so it's irresponsible to require implementing the entire parsing logic by hand just because of this, especially given SDKs won't implement this since they are only concerned with export.

I was able to override just the parsing to trace ids/span id and let Protobuf's default implementation parse the rest: https://github.com/open-telemetry/opentelemetry-collector/blob/afad36e3785b78906f6eb5f9e91354a9e05a274c/receiver/otlpreceiver/otlp_test.go#L76
At least in Go this is possible, not sure if Protobuf libraries for other languages allow this.

@tigrannajaryan
Copy link
Member

I added this to next Maintainers' meeting agenda. I am not sure I will be able to join, but this needs to be discussed. So far I believe we are inclined to say that each language is free to choose their own default (we can also in addition recommend that OTLP/HTTP/Protobuf is the default).

@Oberon00 Oberon00 added the area:configuration Related to configuring the SDK label Sep 30, 2021
@Oberon00 Oberon00 added the area:sdk Related to the SDK label Sep 30, 2021
@tigrannajaryan
Copy link
Member

tigrannajaryan commented Oct 4, 2021

Just checked: AWS still does not fully support gRPC for all load balancer types: https://aws.amazon.com/elasticloadbalancing/features/
It works with ALB, but does not work with NLB (UPDATE: see comment below) or other load balancer types.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 4, 2021

@tigrannajaryan NLB is just a TCP/IP load balancer so it is not tied to an application protocol like gRPC. Actually before ALB supported gRPC, NLB was the only way to implement a gRPC API with a load balancer on AWS. It had the limitation of not being able to integrate things like TLS, so the support in ALB made it much nicer :) But I would consider this to be full support for gRPC on the cloud side.

@tigrannajaryan
Copy link
Member

@tigrannajaryan NLB is just a TCP/IP load balancer so it is not tied to an application protocol like gRPC. Actually before ALB supported gRPC, NLB was the only way to implement a gRPC API with a load balancer on AWS. It had the limitation of not being able to integrate things like TLS, so the support in ALB made it much nicer :) But I would consider this to be full support for gRPC on the cloud side.

All right, I guess I was misled by the gRPC label present in the ALB column and missing in the rest of the columns in the AWS doc I linked to.

@anuraaga
Copy link
Contributor

anuraaga commented Oct 4, 2021

@tigrannajaryan It seems the "Layer 7" section header type of thing is the scapegoat for that :) I think it basically means "Classic Load Balancer in L7 mode" doesn't support gRPC. FWIU, it's EoL so probably not needed for OTLP https://aws.amazon.com/blogs/aws/ec2-classic-is-retiring-heres-how-to-prepare/

tigrannajaryan pushed a commit that referenced this issue Oct 5, 2021
cc @jtescher 

## Changes

Update Rust's support for OTLP/HTTP binary protocol

Related issues #1885
@tigrannajaryan
Copy link
Member

Discussed again in Maintainers meeting. Raw meeting notes:

Feedback from Bob: +1, gRPC has great implementations in C++, Go, Python, and Java, but the other language clients have usability and performance challenges. HTTP is ubiquitous.
Ted: we should still ensure that languages support both, even if they have a preference for HTTP or gRPC!
Proposal: maintainers / the TC no longer have a preference between gRPC or HTTP being the default, but OpenTelmetry components should support both.
No objections in the group

To summarize this is what we want to do:

  1. Strongly recommend SDKs to implement both OTLP/gRPC and OTLP/HTTP+Protobuf exporters (SHOULD clause).
  2. Strongly recommend OTLP/HTTP+Protobuf to be the default protocosl (SHOULD clause) but allow language SDKs to choose a different default if they have good reasons (MAY clause).

I am going to rewrite this PR according to the plan above. If there are any objections to the plan please speak up.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Oct 5, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Oct 5, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Oct 5, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Oct 5, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Oct 5, 2021
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Oct 5, 2021
tigrannajaryan added a commit that referenced this issue Oct 6, 2021
Resolves #1885

- Strongly recommend SDKs to implement both OTLP/gRPC and OTLP/HTTP+Protobuf exporters.
- Strongly recommend OTLP/HTTP+Protobuf to be the default protocol but allow language SDKs to choose a different default if they have good reasons.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-proto that referenced this issue Apr 20, 2023
Resolves open-telemetry/opentelemetry-specification#1885

- Strongly recommend SDKs to implement both OTLP/gRPC and OTLP/HTTP+Protobuf exporters.
- Strongly recommend OTLP/HTTP+Protobuf to be the default protocol but allow language SDKs to choose a different default if they have good reasons.
joaopgrassi pushed a commit to dynatrace-oss-contrib/semantic-conventions that referenced this issue Mar 21, 2024
Resolves open-telemetry/opentelemetry-specification#1885

- Strongly recommend SDKs to implement both OTLP/gRPC and OTLP/HTTP+Protobuf exporters.
- Strongly recommend OTLP/HTTP+Protobuf to be the default protocol but allow language SDKs to choose a different default if they have good reasons.
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
cc @jtescher 

## Changes

Update Rust's support for OTLP/HTTP binary protocol

Related issues open-telemetry#1885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:configuration Related to configuring the SDK area:sdk Related to the SDK spec:protocol Related to the specification/protocol directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants