Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing an OpenAI instrumentor. #1796

Closed
jxnl opened this issue May 10, 2023 · 7 comments · Fixed by #2759
Closed

Contributing an OpenAI instrumentor. #1796

jxnl opened this issue May 10, 2023 · 7 comments · Fixed by #2759
Labels
feature-request gen-ai Related to generative AI

Comments

@jxnl
Copy link

jxnl commented May 10, 2023

Describe the solution you'd like
I noticed that datadog has an instrumentor for the OpenAI package in python here: https://ddtrace.readthedocs.io/en/stable/integrations.html#openai

I was building one for personal use here but i think it might make sense to try to contribute, however im not sure about best practices or testing this kind of stuff. (mostly a Data Scientist) here is what i have so far: https://github.com/jxnl/llamatry/blob/main/llamatry/openai.py

What the datadog one does seems mush more sophisticated.

@slobsv
Copy link

slobsv commented Jun 27, 2023

Hey @jxnl here's a draft of an instrumentor for that OpenAI package: https://github.com/cartermp/opentelemetry-instrument-openai-py

The author (@cartermp) tells me its mostly done and he's just looking for some feedback. Maybe it's worth trying and sharing yours?

@cartermp
Copy link

Thanks for the ping! @jxnl if you're happy to try it out and contribute, I'm happy to take contributions.

It'd also be great if the instrumentations here also had an OpenAI library. I'm happy to contribute mine and work with others to get it into the right shape, and be on the hook to maintain it.

@jxnl
Copy link
Author

jxnl commented Jul 3, 2023

I wrote https://github.com/jxnl/llamatry/blob/main/llamatry/openai.py without thinking about contributing. It works but not sure about best practices. @cartermp

I wrap kwargs rather than each attribute.

Moreover. In practice I don't know if I want to always save messages.

Since data might have privacy issues and for gpt4-32k the text is massive and may add additional latency.

@cartermp
Copy link

cartermp commented Jul 3, 2023

So I think capturing messages by default is crucial for Observability. Otherwise, you have no way of seeing patterns that in user behavior and act on it (either directly or by using it as input to an evaluation system for offline testing). But yes, that should be something someone can turn off for data privacy concerns.

@cartermp
Copy link

cartermp commented Jul 7, 2023

@open-telemetry/opentelemetry-python-contrib-approvers, myself and @slobsv have been maintaining https://github.com/cartermp/opentelemetry-instrument-openai-py and think it might actually be worth contributing here into this repo. Some context in slack.

How might be go about considering this for contribution to the repo? We're up to maintain it.

@slobsv
Copy link

slobsv commented Jul 7, 2023

I wrote https://github.com/jxnl/llamatry/blob/main/llamatry/openai.py without thinking about contributing. It works but not sure about best practices. @cartermp

I wrap kwargs rather than each attribute.

Moreover. In practice I don't know if I want to always save messages.

Since data might have privacy issues and for gpt4-32k the text is massive and may add additional latency.

I agree with @cartermp that including the messages would likely be pretty crucial for most observability use cases. But it's interesting to hear @jxnl that you don't really care (or want) to see them. I'm curious to hear what your main motivation / use case is for tracing your openai calls if you're not interested in the messages? That'd be great context to hear.

If this ends up being a frequent interest, I guess we could add an optional initiation parameter of sorts to not include the messages (although we'd probably default to including them). I think I'd want to see more requests for this before we implement it though.

Otherwise, this other issue on the golang otel repo might provide some inspiration for how to best sanitize risky spans.

@jxnl
Copy link
Author

jxnl commented Jul 8, 2023

@slobsv

yeah imagine a diary app, or some system where you dont want to retain a user's data.

more over as context length increases a long convo has n^2 tokens at maybe 32k token length, it gets wild. (imagine anthropic's 100k)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request gen-ai Related to generative AI
Projects
None yet
4 participants