-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the beginnings of AI Semantic conventions #483
Changes from all commits
32586a5
c83e491
8b46140
1c8d328
4c922cd
2d1bd47
ebcab1e
bdda45a
1f47a6c
0aac56f
bc2ee57
315a76a
f0d12da
6a89928
afc182e
e5173b2
5fd523e
d26784c
5427fe1
471faf4
53c4fb5
5a7ef2d
b9b4d99
27e37d6
272f3f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: AI | ||
path_base_for_github_subdir: | ||
from: content/en/docs/specs/semconv/ai/_index.md | ||
to: database/README.md | ||
---> | ||
|
||
# Semantic Conventions for AI systems | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
This document defines semantic conventions for the following kind of AI systems: | ||
|
||
* LLMs | ||
* LLM Chains and Agents | ||
* LLM Frameworks (e.g., LangChain, LlamaIndex) | ||
* Vector Embeddings | ||
* Vector Databases (e.g., Pinecone, Milvus) | ||
|
||
Semantic conventions for LLM operations are defined for the following signals: | ||
|
||
* [LLM Spans](llm-spans.md): Semantic Conventions for LLM requests - *spans*. | ||
* [LLM Chains and Agents](llm-chains-agents.md): Semantic Conventions for LLM chains and agents - *spans*. | ||
|
||
Technology specific semantic conventions are defined for the following LLM providers: | ||
|
||
* [OpenAI](openai.md): Semantic Conventions for *OpenAI*. | ||
* [Anthropic](anthropic.md): Semantic Conventions for *Anthropic*. | ||
* [Cohere](cohere.md): Semantic Conventions for *Cohere*. | ||
* [Replicate](replicate.md): Semantic Conventions for *Replicate*. | ||
|
||
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: Anthropic | ||
---> | ||
|
||
# Semantic Conventions for Anthropic | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
The Semantic Conventions for [Anthropic](https://docs.anthropic.com/claude/docs) extend the [LLM Semantic Conventions](llm-spans.md) | ||
that describe common LLM request attributes in addition to the Semantic Conventions | ||
described on this page. | ||
|
||
## Anthropic LLM request attributes | ||
|
||
These are additional attributes when instrumenting Anthropic LLM requests. | ||
|
||
<!-- semconv llm.anthropic(tag=llm-request-tech-specific) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `llm.anthropic.top_k` | int | If present, represents the value used to only sample from the top K options for each subsequent token. | `5` | Required | | ||
| `llm.anthropic.metadata.user_id` | string | If present, the `user_id` used in an Anthropic request. | `bob` | Required | | ||
|
||
## Anthropic LLM response attributes | ||
|
||
These are additional attributes when instrumenting Anthropic LLM responses. | ||
|
||
### Chat completion attributes | ||
|
||
These are the attributes for a full chat completion (no streaming). | ||
|
||
<!-- semconv llm.anthropic(tag=llm-response-tech-specific) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `llm.anthropic.stop_reason` | string | The reason why the model stopped sampling. | `stop_sequence` | Required | | ||
| `llm.anthropic.model` | string | The name of the model used for the completion. | `claude-instant-1` | Recommended | |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
todo |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
todo |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Semantic Conventions for LLM requests in Chains or Agents | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` --> | ||
|
||
<!-- toc --> | ||
|
||
- [LLM Request attributes](#llm-request-attributes) | ||
- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies) | ||
|
||
<!-- tocstop --> | ||
|
||
A chain is defined as a sequence of actions, some of which may involve a call to an LLM, controlled by a program. Some requests are made in parallel, following a map-reduce pattern, and some are sequential. Crucially, requests to an LLM are initiated programmatically. | ||
|
||
An agent is defined as an executable that, given instructions, performs any number of actions, some of which may involve requests to an LLM or other services, until certain criteria is satisfied (such as a known end state being reached, an error, or an output evaluating a certain way). Although similar to a chain, and agent is distinguished by the ability to make a request to an LLM on behalf of a program. Requests to an LLM are not controlled by a program, but rather by the agent itself. | ||
|
||
In both cases, traces model the behavior of a chain or an agent. As such, spans in a chain or agent should follow the guidance in [llm-spans](llm-spans.md). | ||
|
||
However, a key conceptual difference between traces used to model LLM behavior and distributed traces is that a group of one or more spans may represent a *step* of a chain or an agent. In simpler applications, such as directly chaining a fixed number of LLM requets together, a single span can adequately represent each step in the chain. However, more complex applications often require a group of spans. | ||
|
||
For example, consider an agent that continuously reads data from a knowledge base, makes a request to an LLM to summarize the data, and evaluates the effectiveness of that summarization, repeating the process until success criteria is met: | ||
|
||
- One or more spans that tracks retrieving a subset of the knowledge base | ||
- One or more spans that tracks one or more requests to an LLM (perhaps in parallel) | ||
- One or more spans that tracks parsing, validation, and/or merging of results from LLM requests | ||
- One or more spans that tracks an evaluation of the final result | ||
|
||
Each of the above groups of spans may represent a single *step* of a chain or agent, indicating a need to distinguish each *step*. | ||
|
||
## LLM Chain attributes | ||
|
||
Despite the similarity with agent attributes, chain attributes are distinguished to represent the difference between a chain and an agent, especially when the two are mixed together. | ||
|
||
<!-- semconv ai(tag=llm-chain-step) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `llm.chain.name`|string|The name of the chain.|`answer-question`|Required| | ||
| `llm.chain.step`|int|Denotes the current step or iteration of an LLM chain.|`0`|Required| | ||
|
||
## LLM Agent Step attributes | ||
|
||
Despite the similarity with chain attributes, agent attributes are distinguished to represent the difference between a chain and an agent, especially when the two are mixed together. | ||
|
||
<!-- semconv ai(tag=llm-agent-step) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `llm.agent.name`|int|The name of the agent.|`document-system-analyzer`|Required| | ||
| `llm.agent.step`|int|Indicates the current step or iteration an agent is performing one or more tasks.|`0`|Required| |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
<!--- Hugo front matter used to generate the website version of this page: | ||
linkTitle: LLM Calls | ||
---> | ||
|
||
# Semantic Conventions for LLM requests | ||
|
||
**Status**: [Experimental][DocumentStatus] | ||
|
||
<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` --> | ||
|
||
<!-- toc --> | ||
|
||
- [LLM Request attributes](#llm-request-attributes) | ||
- [Configuration](#configuration) | ||
- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies) | ||
|
||
<!-- tocstop --> | ||
|
||
A request to an LLM is modeled as a span in a trace. | ||
|
||
The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM. | ||
It MAY be a name of the API endpoint for the LLM being called. | ||
|
||
## Configuration | ||
|
||
Instrumentations for LLMs MUST offer the ability to turn off capture of raw inputs to LLM requests and the completion response text for LLM responses. This is for two primary reasons: | ||
|
||
1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend. | ||
2. Data size concerns. Although there is no specified limit to the size of an attribute, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of. | ||
|
||
By default, these configurations SHOULD capture inputs and outputs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should these inputs and outputs be added as Events instead of directly to the span? They aren't directly used for query and Events in some systems have higher limits on attribute size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would disagree with that. Inputs and outputs are definitely used for querying, such as: "For a system doing text -> json, show me all groups of inputs and outputs where we failed to parse a json response" Or: "Group inputs by feedback responses" Or: "For input , show all grouped outputs" While a backend could in theory assemble these from span events, I think it's far more likely that a tracing backend would just look for this data directly on the spans. I also don't think it fits the conceptual model for span events, as there's not really a meaningful timestamp to assign to this data - it'd have to be contrived or zereod out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's common for backends to have limitations of attribute length E.g. In addition to backend limitations, attribute values will stay in memory until spans are exported and may significantly increase otel memory consumption. It's still possible to query logs/events (as long as they are in the same backend). |
||
|
||
## LLM Request attributes | ||
|
||
These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs. | ||
|
||
<!-- semconv ai(tag=llm-request) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `llm.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required | | ||
| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is that entire JSON object encoded as a string. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Required | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. given the verbosity and that it contain sensitive and private data, this attribute should be opt-in |
||
| `llm.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended | | ||
| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended | | ||
| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended | | ||
| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended | | ||
| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended | | ||
|
||
`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used. | ||
|
||
| Value | Description | | ||
|---|---| | ||
| `gpt-4` | GPT-4 | | ||
| `gpt-4-32k` | GPT-4 with 32k context window | | ||
| `gpt-3.5-turbo` | GPT-3.5-turbo | | ||
| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window| | ||
| `claude-instant-1` | Claude Instant (latest version) | | ||
| `claude-2` | Claude 2 (latest version) | | ||
`other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. | | ||
<!-- endsemconv --> | ||
|
||
## LLM Response attributes | ||
|
||
These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs. | ||
|
||
<!-- semconv ai(tag=llm-response) --> | ||
| Attribute | Type | Description | Examples | Requirement Level | | ||
|---|---|---|---|---| | ||
| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an attribute determined by the specific LLM technology semantic convention for responses.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In openai, you have completion_tokens, prompt_tokens, etc. Is that not generally applicable here? On multiple responses from LLM, if these are captured as events (see my earlier suggestion) then this could be handled by adding multiple events to the Span. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately not every LLM supports this in their response. For example, in anthropic's client SDK they have a separate Perhaps this could be done as an optional attribute, since the reality is that most people are using OpenAI. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for the same reasons as propmt, this should be opt-in (and probably an event/log) |
||
|
||
## Semantic Conventions for specific LLM technologies | ||
|
||
More specific Semantic Conventions are defined for the following database technologies: | ||
|
||
* [OpenAI](openai.md): Semantic Conventions for *OpenAI*. | ||
|
||
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other semconvs we control it with Opt-in requirement level.
Opt-in attributes are always off by default and instrumentations MAY provide configuration.
Given the privacy, verbosity and consistency reasons, I believe we should do the same here.