Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Auth extension API and streaming request metadata #6965

Open
jmacd opened this issue Jan 18, 2023 · 4 comments
Open

Problem with Auth extension API and streaming request metadata #6965

jmacd opened this issue Jan 18, 2023 · 4 comments
Labels

Comments

@jmacd
Copy link
Contributor

jmacd commented Jan 18, 2023

Problem background

OTel collector has several interacting features that support the use "headers" carrying arbitrary key:value (a.k.a. Metadata) for both gRPC and HTTP connections. These include:

  • OTLP/gRPC and OTLP/HTTP receivers support an IncludeMetadata flag. If set, the client.Info is populated with the available metadata for the request and included in the pipeline context.

  • OTLP/gRPC and OTLP/HTTP exporters each support a Headers field,which supports passing additional static metadata associated with each request. The mechanisms are emphatically different--gRPC uses "outgoing context" to receive additional metadata for outgoing RPCs.

  • OTC has an Auth extension API that allows injecting a kind of interceptor that (through common helper methods) works for both OTLP/gRPC and OTLP/HTTP.

  • OTCC has a "headersetter" extension that can be used to configure metadata propagation. It acts as an Auth extension, called for new connections to GetRequestMetadata(). Here, the extension is able to dynamically extend the headers of the outgoing request. In HTTP this is implemented as a RoundTripper; in gRPC this is implemented using an Auth plugin.

  • OTLP/gRPC-specific implementation: gRPC does the work of merging the dynamic request metadata (from the Auth extension) with the outgoing context's metadata (from the Exporter configuration).

  • No other standard OTC and OTCC components appear to use the gRPC outgoing metadata, because it is not supported by HTTP. The only way to inject dynamic metadata in OTC is to use an Auth extension.

  • If a processor or receiver were to setup gRPC's outgoing context, it might work for outgoing gRPC calls, depending on an implementation detail. Note that when the OTLP/gRPC exporter has Headers set, it replaces the gRPC outgoing context, so outgoing metadata doesn't work combined with exporter-configured metadata. (Is this a known issue?)

  • Known issue: the batch processor does not support batching by metadata.).

Problem description

The OTLP-Arrow project in its Phase 1 milestone aims to create an OTel Collector "Bridge" using Apache Arrow and a columnar representation to achieve better compression. Our aim is that the use of this bridge is 100% seamless for users of OTLP/gRPC (and as a future concern, potentially OTLP/HTTP).

Our Phase 1 deliverable initially includes the use of gRPC streaming RPC, which improves compression substantially. The problem is that gRPC streaming RPCS do not pass metadata. Metadata is per-stream quality, and the existing mechanisms for metadata propagation only partially support our needs.

Note that by design, each OTLP-Arrow stream is constructed in the background and limited in number. Streams are constructed using background contexts, so they do not benefit from the dynamic context during authorization.

If the batching processor and routing processor were aware of Metadata and could somehow route to multiple instances of my Exporter (per metadata, as configured), then I could copy the context from the first request on a stream and then each stream could correctly convey Metadata.

We are considering ways to allow the background-constructed stream to propagate client metadata, so that the OTLP-Arrow Bridge will be a seamless experience. To that end, it appears that a new mechanism would be useful. The current Auth extension is fine as far as it goes, but its use should be limited to authorization.

Can OTLP/gRPC and OTLP/HTTP Exporters somehow support non-Auth-related metadata? This could be a new kind of extension dedicated to injecting metadata, not as an Auth operation, but as a general Export operation. Each Export would then have dynamic metadata through a different extension. Exporter's Headers should be merged with the dynamic metadata associated with the non-Auth-related metadata from any non-Auth-related metadata extension.

Then, the OTLP-Arrow mechanism would use the Auth extension to auth each stream and it would use the non-Auth-related extension to get propagated metadata for each request. It would convey that metadata through to the receiver, which (if IncludeMetadata is set, would seamlessly propagate the request's metadata across the bridge.

@jmacd
Copy link
Contributor Author

jmacd commented Jan 18, 2023

For the record, we do not see the Batch processor as being important for users of an OTLP-Arrow exporter, because the benefits of batching are lost in the conversion to Arrow thanks to the column encoding and the use of streams. The gRPC overhead is 5 bytes per stream message, so I believe that batching done by the individual SDK will be sufficient for the OTLP-Arrow use case.

@jpkrohling
Copy link
Member

@jmacd, we discussed a few aspects of this issue during last week's SIG Collector call. Could you please update this issue with information from that discussion?

@jmacd
Copy link
Contributor Author

jmacd commented Jan 24, 2023

I took away several parts of a complex problem, thanks to the discussion.

First, it is apparently well-known in this space that we could benefit from batchprocessor or exporterhelper support for batching by specific metadata keys. This seems unobjectionable, but raises difficult questions about implementation behavior especially under adversarial conditions. Those issues are well covered in #4544 and (possibly) #4646. It is important to the OTLP-Arrow project above that batching be applied to reduce the presence of small batches, so to advertise the benefits of an OTLP-Arrow bridge to users that rely on client metadata, this will be an important issue to solve.

Second, there are some curious loose ends about how the Auth extension compares to, is different from a hypothetical extension that allows setting metadata. Why are http.Headers, gRPC metadata, and collector client Info.Metadata a map[string][]string type while Auth extensions return map[string]string? If my goal is to propagate headers, it is difficult to make the Auth extension work and we can imagine an extension that given a context returns map[string][]string just as an HTTP RoundTripper can do, just as the gRPC interceptors can do--i.e., somewhat more than the Auth extension can do.

Third, there is a question about what Metadata is meant to be used for, because in the case of an OTLP-Arrow bridge there is not an appropriate 1:1 mapping. Presently, metadata is injected at the point where an HTTP "stream" is started. The headers are contextually scoped to the stream, which for unary RPC corresponds 1:1 with the request and its metadata. In the context of a gRPC streaming RPC, there is no longer an explicit 1:1 correspondence between the client.Info.Metadata (from the receiver) and the stream on which it is sent. For a bridge to convey even selected fields of the client.Info.Metadata over a streaming connection, we have to convey them as part of the streaming protocol, for there are not headers at this level of communication. I have implemented this in the prototype, where the exporter and receiver use hpack encoder/decoder to convey the gRPC metadata (i.e., a map[string][]string) over the Arrow bridge.

@jmacd
Copy link
Contributor Author

jmacd commented Jan 27, 2023

I've described the problem in concrete terms using an example w/ basicauth and headerssetter auth. open-telemetry/opentelemetry-collector-contrib#18065

@github-actions github-actions bot added the Stale label Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants