Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add OpenTelemetry distributed tracing #3789

Closed
ronensc opened this issue Apr 2, 2024 · 4 comments · Fixed by #4687
Closed

[Feature]: Add OpenTelemetry distributed tracing #3789

ronensc opened this issue Apr 2, 2024 · 4 comments · Fixed by #4687

Comments

@ronensc
Copy link
Contributor

ronensc commented Apr 2, 2024

🚀 The feature, motivation and pitch

This proposal suggests adding distributed tracing with OepnTelemetry, which will enable operators to export traces in a standard protocol and seamlessly connect them with visualization tools such as Jaeger, Zipkin and Instana

As an initial implementation, I suggest emitting a trace for each request, including the following data:

  • Model name
  • Request ID
  • Sampling params
  • Latency
  • Number of input tokens
  • Number of generated tokens

This approach will greatly enhance the observability and troubleshooting capabilities of vLLM.
I am willing to work on this and contribute it to the community.
I welcome any feedback or suggestions.

Alternatives

No response

Additional context

No response

@gyliu513
Copy link

gyliu513 commented Apr 2, 2024

Hi @ronensc , if you want to contribute, maybe here is a good repo https://github.com/traceloop/openllmetry/tree/main/packages , we already have bunch of LLM providers support here.

FYI @nirga

@nirga
Copy link

nirga commented Apr 2, 2024

@ronensc would love to assist! Ping me on the community slack

@baggiponte
Copy link

@samuelcolvin for logfire!

@MrMegaMango
Copy link

Hi I am using Opentelemetry traces to crack down slowness in vllm call. Only this llm_request one span comes out with some attributes.
382470901-de7da1b8-1ae3-466e-a5b4-b042cae8194f
It only covers 10% of the time spent after a request to vllm's v1/completion endpoint. No nested span calls.
Presumably the start of vllm_request span comes from seq_group.metrics.arrival_time, why it is seconds later than request to v1/compelete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants