Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the beginnings of AI Semantic conventions #483

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions docs/ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: AI
path_base_for_github_subdir:
from: content/en/docs/specs/semconv/ai/_index.md
to: database/README.md
--->

# Semantic Conventions for AI systems

**Status**: [Experimental][DocumentStatus]

This document defines semantic conventions for the following kind of AI systems:

* LLMs
* LLM Chains and Agents
* LLM Frameworks (e.g., LangChain, LlamaIndex)
* Vector Embeddings
* Vector Databases (e.g., Pinecone, Milvus)

Semantic conventions for LLM operations are defined for the following signals:

* [LLM Spans](llm-spans.md): Semantic Conventions for LLM requests - *spans*.
* [LLM Chains and Agents](llm-chains-agents.md): Semantic Conventions for LLM chains and agents - *spans*.

Technology specific semantic conventions are defined for the following LLM providers:

* [OpenAI](openai.md): Semantic Conventions for *OpenAI*.
* [Anthropic](anthropic.md): Semantic Conventions for *Anthropic*.
* [Cohere](cohere.md): Semantic Conventions for *Cohere*.
* [Replicate](replicate.md): Semantic Conventions for *Replicate*.

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
35 changes: 35 additions & 0 deletions docs/ai/anthropic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: Anthropic
--->

# Semantic Conventions for Anthropic

**Status**: [Experimental][DocumentStatus]

The Semantic Conventions for [Anthropic](https://docs.anthropic.com/claude/docs) extend the [LLM Semantic Conventions](llm-spans.md)
that describe common LLM request attributes in addition to the Semantic Conventions
described on this page.

## Anthropic LLM request attributes

These are additional attributes when instrumenting Anthropic LLM requests.

<!-- semconv llm.anthropic(tag=llm-request-tech-specific) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.anthropic.top_k` | int | If present, represents the value used to only sample from the top K options for each subsequent token. | `5` | Required |
| `llm.anthropic.metadata.user_id` | string | If present, the `user_id` used in an Anthropic request. | `bob` | Required |

## Anthropic LLM response attributes

These are additional attributes when instrumenting Anthropic LLM responses.

### Chat completion attributes

These are the attributes for a full chat completion (no streaming).

<!-- semconv llm.anthropic(tag=llm-response-tech-specific) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.anthropic.stop_reason` | string | The reason why the model stopped sampling. | `stop_sequence` | Required |
| `llm.anthropic.model` | string | The name of the model used for the completion. | `claude-instant-1` | Recommended |
1 change: 1 addition & 0 deletions docs/ai/cohere.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
todo
1 change: 1 addition & 0 deletions docs/ai/embeddings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
todo
49 changes: 49 additions & 0 deletions docs/ai/llm-chains-agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Semantic Conventions for LLM requests in Chains or Agents

**Status**: [Experimental][DocumentStatus]

<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

<!-- toc -->

- [LLM Request attributes](#llm-request-attributes)
- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies)

<!-- tocstop -->

A chain is defined as a sequence of actions, some of which may involve a call to an LLM, controlled by a program. Some requests are made in parallel, following a map-reduce pattern, and some are sequential. Crucially, requests to an LLM are initiated programmatically.

An agent is defined as an executable that, given instructions, performs any number of actions, some of which may involve requests to an LLM or other services, until certain criteria is satisfied (such as a known end state being reached, an error, or an output evaluating a certain way). Although similar to a chain, and agent is distinguished by the ability to make a request to an LLM on behalf of a program. Requests to an LLM are not controlled by a program, but rather by the agent itself.

In both cases, traces model the behavior of a chain or an agent. As such, spans in a chain or agent should follow the guidance in [llm-spans](llm-spans.md).

However, a key conceptual difference between traces used to model LLM behavior and distributed traces is that a group of one or more spans may represent a *step* of a chain or an agent. In simpler applications, such as directly chaining a fixed number of LLM requets together, a single span can adequately represent each step in the chain. However, more complex applications often require a group of spans.

For example, consider an agent that continuously reads data from a knowledge base, makes a request to an LLM to summarize the data, and evaluates the effectiveness of that summarization, repeating the process until success criteria is met:

- One or more spans that tracks retrieving a subset of the knowledge base
- One or more spans that tracks one or more requests to an LLM (perhaps in parallel)
- One or more spans that tracks parsing, validation, and/or merging of results from LLM requests
- One or more spans that tracks an evaluation of the final result

Each of the above groups of spans may represent a single *step* of a chain or agent, indicating a need to distinguish each *step*.

## LLM Chain attributes

Despite the similarity with agent attributes, chain attributes are distinguished to represent the difference between a chain and an agent, especially when the two are mixed together.

<!-- semconv ai(tag=llm-chain-step) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.chain.name`|string|The name of the chain.|`answer-question`|Required|
| `llm.chain.step`|int|Denotes the current step or iteration of an LLM chain.|`0`|Required|

## LLM Agent Step attributes

Despite the similarity with chain attributes, agent attributes are distinguished to represent the difference between a chain and an agent, especially when the two are mixed together.

<!-- semconv ai(tag=llm-agent-step) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.agent.name`|int|The name of the agent.|`document-system-analyzer`|Required|
| `llm.agent.step`|int|Indicates the current step or iteration an agent is performing one or more tasks.|`0`|Required|
76 changes: 76 additions & 0 deletions docs/ai/llm-spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: LLM Calls
--->

# Semantic Conventions for LLM requests

**Status**: [Experimental][DocumentStatus]

<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

<!-- toc -->

- [LLM Request attributes](#llm-request-attributes)
- [Configuration](#configuration)
- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies)

<!-- tocstop -->

A request to an LLM is modeled as a span in a trace.

The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM.
It MAY be a name of the API endpoint for the LLM being called.

## Configuration

Instrumentations for LLMs MUST offer the ability to turn off capture of raw inputs to LLM requests and the completion response text for LLM responses. This is for two primary reasons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other semconvs we control it with Opt-in requirement level.

Opt-in attributes are always off by default and instrumentations MAY provide configuration.
Given the privacy, verbosity and consistency reasons, I believe we should do the same here.


1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend.
2. Data size concerns. Although there is no specified limit to the size of an attribute, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.

By default, these configurations SHOULD capture inputs and outputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these inputs and outputs be added as Events instead of directly to the span? They aren't directly used for query and Events in some systems have higher limits on attribute size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would disagree with that. Inputs and outputs are definitely used for querying, such as:

"For a system doing text -> json, show me all groups of inputs and outputs where we failed to parse a json response"

Or:

"Group inputs by feedback responses"

Or:

"For input , show all grouped outputs"

While a backend could in theory assemble these from span events, I think it's far more likely that a tracing backend would just look for this data directly on the spans. I also don't think it fits the conceptual model for span events, as there's not really a meaningful timestamp to assign to this data - it'd have to be contrived or zereod out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's common for backends to have limitations of attribute length

E.g.

In addition to backend limitations, attribute values will stay in memory until spans are exported and may significantly increase otel memory consumption.
Events have the same limitations, so logs seem the only reasonable option given verbosity and the ability to export them right away.

It's still possible to query logs/events (as long as they are in the same backend).


## LLM Request attributes

These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs.

<!-- semconv ai(tag=llm-request) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is that entire JSON object encoded as a string. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Required |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the verbosity and that it contain sensitive and private data, this attribute should be opt-in

| `llm.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended |
| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended |
| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended |

`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

| Value | Description |
|---|---|
| `gpt-4` | GPT-4 |
| `gpt-4-32k` | GPT-4 with 32k context window |
| `gpt-3.5-turbo` | GPT-3.5-turbo |
| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window|
| `claude-instant-1` | Claude Instant (latest version) |
| `claude-2` | Claude 2 (latest version) |
`other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. |
<!-- endsemconv -->

## LLM Response attributes

These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs.

<!-- semconv ai(tag=llm-response) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an attribute determined by the specific LLM technology semantic convention for responses.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In openai, you have completion_tokens, prompt_tokens, etc. Is that not generally applicable here?

On multiple responses from LLM, if these are captured as events (see my earlier suggestion) then this could be handled by adding multiple events to the Span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not every LLM supports this in their response. For example, in anthropic's client SDK they have a separate count_tokens function that you use to pass your prompt and/or response to to get this information.

Perhaps this could be done as an optional attribute, since the reality is that most people are using OpenAI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the same reasons as propmt, this should be opt-in (and probably an event/log)


## Semantic Conventions for specific LLM technologies

More specific Semantic Conventions are defined for the following database technologies:

* [OpenAI](openai.md): Semantic Conventions for *OpenAI*.

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
Loading