Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry event span for socket dispatch #6019

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

danschultzer
Copy link

@danschultzer danschultzer commented Dec 16, 2024

The reason for adding this telemetry event is that my OTel spans are full of WebSocket connections that just reads GET. This is because Bandit will just use the method which is correct as the URL may be high cardinality, however the socket path is low cardinality and should show up as e.g. GET /live/websocket.

Since socket requests are handled and halted before the rest of the endpoint plugs I can't depend on Plug.Telemetry to report them and use the [:phoenix, :endpoint] events. I could depend on [:phoenix, :socket_connected] but this will only be for successful connections.

So I'm proposing that we add a [:phoenix, :socket_dispatch] event matching the [:phoenix, :router_dispatch] events. It somewhat conflicts with the existing [:phoenix, :socket_connected] though this event only have high level metadata vs socket connected having connection metadata including the result of setting up the socket.

With the ability to attach to this telemetry event we can update the :opentelemetry_phoenix to update the span name with the route template in the same way it currently does for router dispatch events.

@josevalim
Copy link
Member

Hi @danschultzer. Our long term goal is to move the socket dispatch out of being hardcoded and make it part of the regular plug pipeline. This means we would eventually "break" this event. Therefore, if we want to automate it, I would do it for the upgrade command itself, if possible, otherwise we wait.

@danschultzer
Copy link
Author

Thanks @josevalim, makes sense to not introduce new events if they have to be removed later.

Therefore, if we want to automate it, I would do it for the upgrade command itself, if possible, otherwise we wait.

Unfortunately I don't see any way to resolve this because the socket handling happens before anything else. This is a gotcha for everyone implementing OTel tracing where they need to set up custom sampling to filter out all these stray GET traces as it can't be properly attributed to the socket route (and only filter out those as there can be other generic HTTP traces you want to keep). The socket HTTP requests are the most common HTTP requests with Phoenix LiveView.

There are several issues:

  1. It's for any socket transport, not just websocket, e.g. longpoll does not send any events at all
  2. The upgrade event is only sent when it's succesful, which means there may be stray events for unsuccesful upgrades with websocket
  3. The socket can have path variables like /:foo/socket which means we can't depend on the url attribute alone (from bandit/cowboy), we need the URL template route, and Plug.Telemetry won't know about the route template

Our long term goal is to move the socket dispatch out of being hardcoded and make it part of the regular plug pipeline. This means we would eventually "break" this event.

If moved to the plug pipeline how would the routing work? Would the socket definitions be pushed to the router instead so it becomes part of the router dispatch events? The problem is that we receive telemetry events from the web server (bandit/cowboy), and when it reaches Phoenix we need to enrich the traces with the route template so it can show up with an appropriate span name and attributes for context.

I also think it's preferable to have events/spans for all requests happening in the Phoenix endpoint.

If the idea is to add it to the router, then we could maybe make it a router_dispatch event for now? If not, maybe we can somehow leverage Plug.Telemetry events inferring it during socket dispatch, but the (lesser) problem with that is that Plug.Telemetry events doesn't have any route metadata so it has to be handled on the consumer side with introspecting of the endpoint.

I'm trying to think of ways to use existing events to resolve this for now, while not making it be a thing that will break in the future. Long term can mean it'll be a very long time before sockets becomes part of the plug pipeline (and even then Plug.Telemetry doesn't have the low cardinailty route in metadata).

@josevalim
Copy link
Member

Is this something that could be pushed to the transport and each transport emits its own event? The benefit is that they could attach metadata relevant to them?

@danschultzer
Copy link
Author

Yeah, that could work, but not sure there's much benefit to it. The route needs to be passed to the transport plug, the transport config is already handled in the endpoint instead of in the plug, and we have all metadata in the returned conn. The only thing I can think of that may be missing would be metadata for exceptions raised, e.g. connect_info, but I don't think any exceptions are raised unless it's a config error.

Alternatively we can have the span around the plug call as in this PR, but change the event name to [:phoenix, :websocket] and [:phoenix, :longpoll]? Does that naming make sense longterm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants