Allow configuring a mapping from endpoint to a service name #555

anuraaga · 2020-06-22T09:12:01Z

It is generally important for a user to be able to specify a peer service name for each client they use in an app. For example (this example will apply to many different databases), a user may access two caches with Redis, CatCache and DogCache. In Redis, it is not uncommon for an endpoint to be defined as an IP address. This means that spans will have net.peer.ip set to this IP address and no identifying information at all otherwise - this makes traces far less readable, especially since Redis itself will not have server spans showing up and the client spans must have as much utility as possible.

With manual instrumentation, users would generally set CatCache, DogCache as service names when registering each client with a tracer. But with auto instrumentation, we can't expect user apps to do this. A way to work around this would be for us to have a configuration, which is a mapping from endpoint -> service name. The endpoint could be matched against net.peer.name and net.peer.ip, and for any span that matches, we populate peer.service, a new convention which I'm pretty sure will be approved since this use case is too important, with the name in the config.

Some points to ponder

Will be the first configuration option of Map type. Format? Maybe

PEER_ENDPOINT_NAME_MAPPING=1.2.3.4=CatCache,dogs.com=DogCache
peer.endpoint.name.mapping=1.2.3.4=CatCache,dogs.com=DogCache (though I think there's some convention to use [key] for a map in properties, but I'd probably stick with the simpler

peer.endpoint.name.mapping:
  1.2.3.4: CatCache
  2.3.4.5: DogCache

For sharded, config has to have every endpoint mapped to the same name. Can't think of a way to avoid this though with auto instrumentation.
Endpoint groups are often dynamically updatable, but this configuration isn't. Probably have to give up on this point

For the harder cases where sharding is involved, we could additionally provide an entrypoint for a user app to use, it removes the "total auto instrumentation" but maybe is still an ok compromise.

@Bean
public AsyncCommands catsCache() {
  try (SafeCloseable unused = AutoOpenTelemetry.withPeerService("CatCache")) {
    // Or something dynamic like RedisClient.create(ZookeeperEndpointGroup.of("cat-cache")), it doesn't matter
    return RedisClient.create("redis://1.2.3.4").sync();
  }
}

@Bean
public AsyncCommands dogsCache() {
  try (SafeCloseable unused = AutoOpenTelemetry.withPeerService("DogCache")) {
    return RedisClient.create("redis://2.3.4.5").sync();
  }
}

Let me know any thoughts. After thinking about the sharded case, I'm starting to lean towards not bothering with a static config since it seems like it could miss a lot of use cases. If it was trivial, yeah, but there will be infrastructure for supporting a map configuration.

The text was updated successfully, but these errors were encountered:

anuraaga · 2020-06-23T06:38:56Z

Warming back up to having the config as well - while it will miss some cases it's true that we should get what we can without app modifications before resorting to them.

Though another realization I had is it's common for things like elasticsearch, cassandra, to only input 1-2 IPs and get the rest from the cluster, making the static mapping problem even more challenging ><

iNikem · 2020-06-23T11:24:32Z

First, I am not sure I quite understand "With manual instrumentation, users would generally set CatCache, DogCache as service names when registering each client with a tracer". Can you point me to the corresponding API method?

Second, I am not sure this functionality should live in agent as opposed to Collector. Collector could use the whole span to determine the desired service name. E.g. in case of path-based routing, such as Spring Gateway, service name does not depend on ip, but on the portion of the path requested.

anuraaga · 2020-06-23T12:04:22Z

TBH I don't know if there is a great API in otel Java yet since we usually use auto instrumentation, but with Brave there's always a way to set the service name like here

https://github.com/aws-samples/aws-xray-sdk-with-opentelemetry-sample/blob/both-otel-and-xray/frontend/src/main/java/com/softwareaws/xray/examples/Application.java#L111

Services that use path, gRPC is also a famous one, are relatively easy to deal with because the service can be extracted from the path in a consistent way but not so simple for databases where often there is nothing except for an IP address.

Collector is an option but I think it is a job of the instrumentation to provide a user the ability to specify these names, since there is nothing else in the span that can help for these databases it is completely up to the user. Requiring the collector for something as important as being able to differentiate between different database clients in the same app seems too strict to me. And while I can think of the programmatic approaches presented to solve the dynamic endpoint case, I think that will be basically impossible in the collector.

iNikem · 2020-06-23T12:08:30Z

If there is no such functionality in manual instrumentation (meaning otel-java), than IMO it is wrong to add this here but not there. Users should be able to get everything without the requirement to use byte code manipulation if they choose so.

anuraaga · 2020-06-23T12:20:01Z

Well I don't think something like that would live in the SDK itself but in instrumentation. I would say the closest example we have is customization of kind here

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation-core/aws-sdk/aws-sdk-2.2-core/src/main/java/io/opentelemetry/instrumentation/awssdk/v2_2/AwsSdk.java#L66

If we had more manual instrumentation, we could imagine they expose a builder or factory with customization options, similar to most of the brave instrumentation libraries. But right now we're still basically only auto instrumentation.

Another point to keep in mind is with manual instrumentation it is easy to expose a knob but not with auto. So a solution for auto will almost assuredly be dramatics different, and often have tradeoffs, than one for manual instrumentation where it's very straight forward.

iNikem · 2020-06-23T12:26:38Z

Still not convinced :) @trask @tylerbenson @jkwatson what do you think?

trask · 2020-06-23T16:44:36Z

@iNikem are you not convinced about the peer.service specification proposal (open-telemetry/opentelemetry-specification#652), or this particular implementation of it?

jkwatson · 2020-06-23T16:46:45Z

I'd like the instrumentation group to explore the possibility of having auto-instrumentation that also allows programmatic, runtime configuration added to it, via callbacks, or instrumentation hooks (or something akin to those). I'd rather not bake that sort of thing into the SDK itself unless we have to.

iNikem · 2020-06-23T16:47:11Z

Attribute is good, no objection here. I just don't think configuration of "ip -> service name" mapping in agent is the best place for it.

tylerbenson · 2020-06-23T22:55:27Z

I also feel an ip -> service mapping is not great. Seems difficult to configure and maintain in any moderate sized environment.

anuraaga · 2020-06-23T23:07:02Z

Yeah I mentioned my issues with the approach of mapping too ;) But it's the only approach I could find that holds the principle of agents, no modification to user apps. It doesn't solve complex cases but it does do fine with simpler ones.

But I'm happy with focusing on a programmatic approach if that's not necessary - it was my first suggestion :)

trask · 2020-06-24T04:55:52Z

It's nice to have a codeless option for people who don't want to modify their application.

We can still layer a programmatic approach on top of that for people who outgrow the codeless option.

iNikem · 2020-07-09T13:13:04Z

@anuraaga should this be closed by #562?

anuraaga · 2020-07-09T14:01:04Z

Yup except #657 was definitely lacking, sorry about that. Will follow up with that issue.

anuraaga mentioned this issue Jun 22, 2020

Allow exporters to decorate SpanData open-telemetry/opentelemetry-java#1321

Closed

anuraaga mentioned this issue Jun 23, 2020

Add endpoint.peer.service.mapping property to allow a user to specify… #562

Merged

This was referenced Jun 24, 2020

Add peer.service semantic convention to indicate the name of a target… open-telemetry/opentelemetry-specification#652

Merged

Trace SDK: Provide definitions for readable and read/write span. open-telemetry/opentelemetry-specification#669

Merged

anuraaga closed this as completed Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuring a mapping from endpoint to a service name #555

Allow configuring a mapping from endpoint to a service name #555

anuraaga commented Jun 22, 2020

anuraaga commented Jun 23, 2020

iNikem commented Jun 23, 2020

anuraaga commented Jun 23, 2020

iNikem commented Jun 23, 2020

anuraaga commented Jun 23, 2020

iNikem commented Jun 23, 2020

trask commented Jun 23, 2020

jkwatson commented Jun 23, 2020

iNikem commented Jun 23, 2020

tylerbenson commented Jun 23, 2020

anuraaga commented Jun 23, 2020

trask commented Jun 24, 2020

iNikem commented Jul 9, 2020

anuraaga commented Jul 9, 2020

Allow configuring a mapping from endpoint to a service name #555

Allow configuring a mapping from endpoint to a service name #555

Comments

anuraaga commented Jun 22, 2020

anuraaga commented Jun 23, 2020

iNikem commented Jun 23, 2020

anuraaga commented Jun 23, 2020

iNikem commented Jun 23, 2020

anuraaga commented Jun 23, 2020

iNikem commented Jun 23, 2020

trask commented Jun 23, 2020

jkwatson commented Jun 23, 2020

iNikem commented Jun 23, 2020

tylerbenson commented Jun 23, 2020

anuraaga commented Jun 23, 2020

trask commented Jun 24, 2020

iNikem commented Jul 9, 2020

anuraaga commented Jul 9, 2020