Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring a mapping from endpoint to a service name #555

Closed
anuraaga opened this issue Jun 22, 2020 · 14 comments
Closed

Allow configuring a mapping from endpoint to a service name #555

anuraaga opened this issue Jun 22, 2020 · 14 comments

Comments

@anuraaga
Copy link
Contributor

It is generally important for a user to be able to specify a peer service name for each client they use in an app. For example (this example will apply to many different databases), a user may access two caches with Redis, CatCache and DogCache. In Redis, it is not uncommon for an endpoint to be defined as an IP address. This means that spans will have net.peer.ip set to this IP address and no identifying information at all otherwise - this makes traces far less readable, especially since Redis itself will not have server spans showing up and the client spans must have as much utility as possible.

With manual instrumentation, users would generally set CatCache, DogCache as service names when registering each client with a tracer. But with auto instrumentation, we can't expect user apps to do this. A way to work around this would be for us to have a configuration, which is a mapping from endpoint -> service name. The endpoint could be matched against net.peer.name and net.peer.ip, and for any span that matches, we populate peer.service, a new convention which I'm pretty sure will be approved since this use case is too important, with the name in the config.

Some points to ponder

  • Will be the first configuration option of Map type. Format? Maybe

PEER_ENDPOINT_NAME_MAPPING=1.2.3.4=CatCache,dogs.com=DogCache
peer.endpoint.name.mapping=1.2.3.4=CatCache,dogs.com=DogCache (though I think there's some convention to use [key] for a map in properties, but I'd probably stick with the simpler

peer.endpoint.name.mapping:
  1.2.3.4: CatCache
  2.3.4.5: DogCache
  • For sharded, config has to have every endpoint mapped to the same name. Can't think of a way to avoid this though with auto instrumentation.

  • Endpoint groups are often dynamically updatable, but this configuration isn't. Probably have to give up on this point

For the harder cases where sharding is involved, we could additionally provide an entrypoint for a user app to use, it removes the "total auto instrumentation" but maybe is still an ok compromise.

@Bean
public AsyncCommands catsCache() {
  try (SafeCloseable unused = AutoOpenTelemetry.withPeerService("CatCache")) {
    // Or something dynamic like RedisClient.create(ZookeeperEndpointGroup.of("cat-cache")), it doesn't matter
    return RedisClient.create("redis://1.2.3.4").sync();
  }
}

@Bean
public AsyncCommands dogsCache() {
  try (SafeCloseable unused = AutoOpenTelemetry.withPeerService("DogCache")) {
    return RedisClient.create("redis://2.3.4.5").sync();
  }
}

Let me know any thoughts. After thinking about the sharded case, I'm starting to lean towards not bothering with a static config since it seems like it could miss a lot of use cases. If it was trivial, yeah, but there will be infrastructure for supporting a map configuration.

@anuraaga
Copy link
Contributor Author

Warming back up to having the config as well - while it will miss some cases it's true that we should get what we can without app modifications before resorting to them.

Though another realization I had is it's common for things like elasticsearch, cassandra, to only input 1-2 IPs and get the rest from the cluster, making the static mapping problem even more challenging ><

@iNikem
Copy link
Contributor

iNikem commented Jun 23, 2020

First, I am not sure I quite understand "With manual instrumentation, users would generally set CatCache, DogCache as service names when registering each client with a tracer". Can you point me to the corresponding API method?

Second, I am not sure this functionality should live in agent as opposed to Collector. Collector could use the whole span to determine the desired service name. E.g. in case of path-based routing, such as Spring Gateway, service name does not depend on ip, but on the portion of the path requested.

@anuraaga
Copy link
Contributor Author

TBH I don't know if there is a great API in otel Java yet since we usually use auto instrumentation, but with Brave there's always a way to set the service name like here

https://github.com/aws-samples/aws-xray-sdk-with-opentelemetry-sample/blob/both-otel-and-xray/frontend/src/main/java/com/softwareaws/xray/examples/Application.java#L111

Services that use path, gRPC is also a famous one, are relatively easy to deal with because the service can be extracted from the path in a consistent way but not so simple for databases where often there is nothing except for an IP address.

Collector is an option but I think it is a job of the instrumentation to provide a user the ability to specify these names, since there is nothing else in the span that can help for these databases it is completely up to the user. Requiring the collector for something as important as being able to differentiate between different database clients in the same app seems too strict to me. And while I can think of the programmatic approaches presented to solve the dynamic endpoint case, I think that will be basically impossible in the collector.

@iNikem
Copy link
Contributor

iNikem commented Jun 23, 2020

If there is no such functionality in manual instrumentation (meaning otel-java), than IMO it is wrong to add this here but not there. Users should be able to get everything without the requirement to use byte code manipulation if they choose so.

@anuraaga
Copy link
Contributor Author

Well I don't think something like that would live in the SDK itself but in instrumentation. I would say the closest example we have is customization of kind here

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation-core/aws-sdk/aws-sdk-2.2-core/src/main/java/io/opentelemetry/instrumentation/awssdk/v2_2/AwsSdk.java#L66

If we had more manual instrumentation, we could imagine they expose a builder or factory with customization options, similar to most of the brave instrumentation libraries. But right now we're still basically only auto instrumentation.

Another point to keep in mind is with manual instrumentation it is easy to expose a knob but not with auto. So a solution for auto will almost assuredly be dramatics different, and often have tradeoffs, than one for manual instrumentation where it's very straight forward.

@iNikem
Copy link
Contributor

iNikem commented Jun 23, 2020

Still not convinced :) @trask @tylerbenson @jkwatson what do you think?

@trask
Copy link
Member

trask commented Jun 23, 2020

@iNikem are you not convinced about the peer.service specification proposal (open-telemetry/opentelemetry-specification#652), or this particular implementation of it?

@jkwatson
Copy link
Contributor

I'd like the instrumentation group to explore the possibility of having auto-instrumentation that also allows programmatic, runtime configuration added to it, via callbacks, or instrumentation hooks (or something akin to those). I'd rather not bake that sort of thing into the SDK itself unless we have to.

@iNikem
Copy link
Contributor

iNikem commented Jun 23, 2020

Attribute is good, no objection here. I just don't think configuration of "ip -> service name" mapping in agent is the best place for it.

@tylerbenson
Copy link
Member

I also feel an ip -> service mapping is not great. Seems difficult to configure and maintain in any moderate sized environment.

@anuraaga
Copy link
Contributor Author

Yeah I mentioned my issues with the approach of mapping too ;) But it's the only approach I could find that holds the principle of agents, no modification to user apps. It doesn't solve complex cases but it does do fine with simpler ones.

But I'm happy with focusing on a programmatic approach if that's not necessary - it was my first suggestion :)

@trask
Copy link
Member

trask commented Jun 24, 2020

It's nice to have a codeless option for people who don't want to modify their application.

We can still layer a programmatic approach on top of that for people who outgrow the codeless option.

@iNikem
Copy link
Contributor

iNikem commented Jul 9, 2020

@anuraaga should this be closed by #562?

@anuraaga
Copy link
Contributor Author

anuraaga commented Jul 9, 2020

Yup except #657 was definitely lacking, sorry about that. Will follow up with that issue.

@anuraaga anuraaga closed this as completed Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants