feat(lb): Introduce the ability to load balance on composite keys in lb #35125

dhftah · 2024-09-10T18:13:01Z

Right now, there's a problem at high throughput using the load balancer and the service.name resource attribute: The load balancers themself get slow. While it's possible to vertically scale them to a point (e.g. about 100k req/sec), as they get slow they star tot back up traffic and block on requests. Applications then can't write as many spans out, and start dropping spans.

This commit seeks to address that by extending the load balancing collector to allow create a composite from attributes that can still keep the load balancing decision "consistent enough" to reduce cardinallity, but still spread the load across ${N} collectors.

It doesn't make too many assumptions about how the operators will use this, except that the underlying data (the spans) are unlikely to be complete in all cases, and the key generation is "best effort". This is a deviation from the existing design, in which hard-requires "span.name".

== Design Notes
=== Contributor Skill

As a contributor, I'm very much new to the opentelemetry collector, and do not anticipate I will be contributing much except for as needs require to tune the collectors that I am responsible for. Given this, the code may violate certain assumptions that are otherwise "well known".

=== Required Knowledge

The biggest surprise in this code was that despite accepting a slice, the routingIdentifierFromTraces function assumes spans have been processed with the batchpersignal.SplitTraces() function, which appears to ensure taht each "trace" contains only a single span (thus allowing them to be multiplexed effectively)

This allows the function to be simplified quite substantially.

=== Use case

The primary use case I am thinking about when writing this work is calculating metrics in the spanmetricsconnector component. Essentially, services drive far too much traffic for a single collector instance to handle, so we need to multiplex it in a way that still allows them to be calculated in a single place (limiting cardinality) but also, spreads the load across ${N} collectors.

=== Traces only implementation

This commit addreses this only for traces, as I only care about traces. The logic can likely be extended easily, however.

Right now, there's a problem at high throughput using the load balancer and the `service.name` resource attribute: The load balancers themself get slow. While it's possible to vertically scale them to a point (e.g. about 100k req/sec), as they get slow they star tot back up traffic and block on requests. Applications then can't write as many spans out, and start dropping spans. This commit seeks to address that by extending the load balancing collector to allow create a composite from attributes that can still keep the load balancing decision "consistent enough" to reduce cardinallity, but still spread the load across ${N} collectors. It doesn't make too many assumptions about how the operators will use this, except that the underlying data (the spans) are unlikely to be complete in all cases, and the key generation is "best effort". This is a deviation from the existing design, in which hard-requires "span.name". == Design Notes === Contributor Skill As a contributor, I'm very much new to the opentelemetry collector, and do not anticipate I will be contributing much except for as needs require to tune the collectors that I am responsible for. Given this, the code may violate certain assumptions that are otherwise "well known". === Required Knowledge The biggest surprise in this code was that despite accepting a slice, the routingIdentifierFromTraces function assumes spans have been processed with the batchpersignal.SplitTraces() function, which appears to ensure taht each "trace" contains only a single span (thus allowing them to be multiplexed effectively) This allows the function to be simplified quite substantially. === Use case The primary use case I am thinking about when writing this work is calculating metrics in the spanmetricsconnector component. Essentially, services drive far too much traffic for a single collector instance to handle, so we need to multiplex it in a way that still allows them to be calculated in a single place (limiting cardinality) but also, spreads the load across ${N} collectors. === Traces only implementation This commit addreses this only for traces, as I only care about traces. The logic can likely be extended easily, however.

jpkrohling

I did a first pass, and there's a concern around the assumption that there's is only one span per trace in the batches, which I'm not sure is correct. In any case, I understand this change is being battle tested and I'm interested in hearing more once the results are available.

jpkrohling

Apart from the linting failures, this LGTM.

@dhftah, would you please include a changelog entry for this?

github-actions bot added the exporter/loadbalancing label Sep 10, 2024

github-actions bot requested a review from jpkrohling September 10, 2024 18:13

dhftah force-pushed the main branch 7 times, most recently from 942648e to db63e7b Compare September 11, 2024 17:48

dhftah force-pushed the main branch from db63e7b to 9fb7317 Compare September 11, 2024 18:01

dhftah marked this pull request as ready for review September 11, 2024 18:04

dhftah requested a review from a team September 11, 2024 18:04

github-actions bot assigned MovieStoreGuy Sep 11, 2024

jpkrohling reviewed Sep 13, 2024

View reviewed changes

jpkrohling approved these changes Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lb): Introduce the ability to load balance on composite keys in lb #35125

feat(lb): Introduce the ability to load balance on composite keys in lb #35125

dhftah commented Sep 10, 2024

jpkrohling left a comment

jpkrohling left a comment

feat(lb): Introduce the ability to load balance on composite keys in lb #35125

Are you sure you want to change the base?

feat(lb): Introduce the ability to load balance on composite keys in lb #35125

Conversation

dhftah commented Sep 10, 2024

jpkrohling left a comment

Choose a reason for hiding this comment

jpkrohling left a comment

Choose a reason for hiding this comment