From b43476108eae117677336c0e01e13aef99725249 Mon Sep 17 00:00:00 2001 From: Travis Wilson <35748617+trrwilson@users.noreply.github.com> Date: Tue, 5 Nov 2024 10:58:24 -0800 Subject: [PATCH] README updates for beta.2 (#73) --- dotnet/samples/README.md | 102 ++++++++++++++++++--- dotnet/samples/console-from-file/README.md | 6 +- 2 files changed, 92 insertions(+), 16 deletions(-) diff --git a/dotnet/samples/README.md b/dotnet/samples/README.md index 2509a2b..50c835b 100644 --- a/dotnet/samples/README.md +++ b/dotnet/samples/README.md @@ -4,7 +4,7 @@ This folder contains samples that use the `/realtime` API with the OpenAI .NET S | | | |---|---| -| Last updated for | Azure.AI.OpenAI.2.1.0-beta.1 | +| Last updated for | Azure.AI.OpenAI.2.1.0-beta.2 | ## General patterns @@ -19,6 +19,8 @@ AzureOpenAIClient topLevelClient = new( RealtimeConversationClient client = topLevelClient.GetRealtimeConversationClient("my-gpt-4o-realtime-preview-deployment"); ``` +If connecting to OpenAI's `/v1/realtime` endpoint, substitute use of `OpenAIClient` or construct a `RealtimeConversationClient` directly. All other usage is identical. + ### Session setup To begin a `/realtime` session, call `StartConversationSessionAsync()` on a configured `RealtimeConversationClient` instance. Note that `RealtimeConversationSession` implements `IDisposable` and consider employing the `using` keyword to ensure prompt connection cleanup. @@ -45,6 +47,10 @@ ConversationSessionOptions options = new() await session.ConfigureSessionAsync(options); ``` +**Input audio transcription** (an approximation of what was said in user-provided input audio) is not enabled by default; to enable it, populate the `InputTranscriptionOptions` property as above. + +By default, **turn detection** will use server voice activity detection (VAD). To disable this or customize the behavior of server VAD, provide a value to the `TurnDetectionOptions` property -- `ConversationTurnDetectionOptions.CreateDisabledTurnDetectionOptions()` will provide an instance that turns VAD off, enabling push-to-talk or a custom client-side VAD implementation to be used. + ### Sending data **Audio**: @@ -53,41 +59,111 @@ For simplicity, samples here will often use a "fire and forget" pattern with an ```csharp using Stream audioInputStream = File.OpenRead("..\\audio_hello_world.wav"); -_ = session.SendAudioAsync(audioInputStream); +_ = session.SendInputAudioAsync(audioInputStream); ``` -This `Stream`-based method will automatically read and chunk data from the stream +This `Stream`-based method will automatically read and chunk data from the stream. If finer granularity or otherwise push-style control is needed, the `SendInputAudioAsync(BinaryData)` method signature can be used to send chunks individually. + +**Text and other non-audio data**: -**Text**: +Text input, tool responses, conversation history, and other information are supplied to the session via the `AddItemAsync()` method. The `ConversationItem` type provides various static factory methods to instantiate items including role-based chat messages and function tool outputs, among others. For example: -Text input, tool responses, conversation history, and other information are supplied to the session via the `AddItemAsync()` method. The `ConversationItem` type provides various static factory methods to instantiate items including role-based chat messages and function tool outputs, among others. +- `ConversationItem.CreateUserMessage()` creates a user-role conversation item reflecting one or more content parts that can feature text input. +- `ConversationItem.CreateFunctionCallOutput()` creates a conversation item that responds to a received function call. +- `ConversationItem.CreateAssistantMessage()` and `ConversationItem.CreateFunctionCall()` facilitate the creation of items that form or restore a conversation history. + +```csharp +await session.AddItemAsync( + ConversationItem.CreateUserMessage(["Hello, assistant! Can you help me today?"])); +``` **Manual messages** -Only a subset of the full `/realtime` protocol is currently represented; if sending an explicit message is desired, the generic `conversation.SendMessageAsync(data)` allows an arbitrary message to be sent: +If sending an explicit message is desired, the generic `session.SendCommandAsync(BinaryData)` allows an arbitrary message to be sent: ```csharp -await conversation.SendMessageAsync(BinaryData.FromString(""" +await session.SendCommandAsync(BinaryData.FromString(""" { - "event": "create_conversation", - "label": "my_second_conversation" + "event": "session.update", + "session": { + } } """); ``` ### Receiving data -Incoming message receipt is pumped via the `IAsyncEnumerable` provided by `session.ReceiveUpdatesAsync()`. In addition to being downcastable into derived types that encapsulate command-specific data, each `ConversationUpdate` also exposes a generic `BinaryData` instance via the `GetRawContent()` method, which will provide the direct JSON payload present in the message. +Incoming message receipt is pumped via the `IAsyncEnumerable` provided by `session.ReceiveUpdatesAsync()`. Each incoming `ConversationUpdate` has an enumerated `Kind` value that maps directly to a WebSocket server event type (like `session.started`) and, depending on the type, each update will be downcastable to a derived type of `ConversationUpdate` with additional data specific to the event. + +As an example: upon connection, the session will receive a `session.updated` server event that's received as a `ConversationSessionStartedUpdate` via `ReceiveUpdatesAsync()`. That will expose a `SessionStarted` enumeration value on its `Kind` property and be accessible via downcast: ```csharp await foreach (ConversationUpdate update in conversation.ReceiveUpdatesAsync()) { - Console.WriteLine(message.GetRawContent().Content.ToString()); + // update.Kind == ConversationUpdateKind.SessionStarted (session.started) if (update is ConversationSessionStartedUpdate sessionStartedUpdate) { - // ... + Console.WriteLine($"New session started, id = {sessionStartedUpdate.SessionId}"); } } ``` -`ConversationUpdate` also exposes a `Kind` property with a enum value that directly maps to an associated WebSocket command `type`. \ No newline at end of file +**Session-wide updates** + +The following all provide information pertaining the session itself or to the shared information persisted across responses in the session: + +| Derived type | Kind value(s) | WebSocket event | Description | +|---|---|---|---| +| `ConversationSessionStartedUpdate` | `SessionStarted` | `session.created` | Raised upon successful connection. Provides *default* session configuration values that do not reflect any changes made via `ConfigureSessionAsync()`. | +| `ConversationSessionConfiguredUpdate` | `SessionConfigured` | `session.updated` | Raised upon receipt of a `session.update` command via `ConfigureSessionAsync()`. Provides *updated* session configured values reflecting the requested changes. Response-level changes will take effect beginning with the next response. | +| `ConversationInputSpeechStartedUpdate` | `InputSpeechStarted` | `input_audio_buffer.speech_started` | With server-side voice activity detection enabled (also default), this is raised when the audio provided via `SendInputAudioAsync()` has speech detected. | +| `ConversationInputSpeechFinishedUpdate` | `InputSpeechFinished` | `input_audio_buffer.speech_stopped` | With server-side voice activity detection enabled (also default), this is raised when the audio provided via `SendInputAudioAsync()` ceases to detect active speech. | +| `ConversationInputAudioCommittedUpdate` | `InputAudioCommitted` | `input_audio_buffer.committed` | Raised when input audio is committed as conversation input. This will occur automatically when server-side voice activity detection is enabled, upon end of speech detection. Without server VAD, an explicit call to `CommitInputAudioAsync()` is required. | +| `ConversationInputAudioClearedUpdate` | `InputAudioCleared` | `input_audio_buffer.cleared` | Raised when input audio is cleared via a call to `ClearInputAudioAsync()`. | +| `ConversationRateLimitsUpdate` | `RateLimitsUpdated` | `rate_limits.updated` | Periodically raised to reflect the latest rate limit information for tokens and requests. | + +**Response-level updates** + +| Derived type | Kind value(s) | WebSocket event | Description | +|---|---|---|---| +| `ConversationResponseStartedUpdate` | `ResponseStarted` | `response.created` | Raised when the model begins generating a new response, snapshotting current input state. This occurs automatically with end of speech when server voice activity detection is enabled and can be requested manually via `StartResponseAsync()`. | +| `ConversationResponseFinishedUpdate` | `ResponseFinished` | `response.done` | Raised when all response data is complete. | + +**Item-level updates** + +| Derived type | Kind value(s) | WebSocket event | Description | +|---|---|---|---| +| `ConversationItemCreatedUpdate` | `ItemCreated` | `conversation.item.created` | | +| `ConversationItemDeletedUpdate` | `ItemDeleted` | `conversation.item.deleted` | | +| `ConversationItemTruncatedUpdate` | `ItemTruncated` | `conversation.item.truncated` | | +| `ConversationInputTranscriptionFinishedUpdate` | `InputTranscriptionFinished` | `conversation.item.input_audio_transcription.completed` | | +| `ConversationInputTranscriptionFailedUpdate` | `InputTranscriptionFailed` | `conversation.item.input_audio_transcription.failed` | | + +**Item streaming updates** + +| Derived type | Kind value(s) | WebSocket event | Description | +|---|---|---|---| +| `ConversationItemStreamingStartedUpdate` | `ItemStreamingStarted` | `response.output_item.added` | Received when a new output item is opened for the response and begins receiving streamed information. This will be followed by some number of `ConversationItemStreamingPartDeltaUpdate` instances providing the streamed data before a `ConversationItemStreamingFinishedUpdate` signals the end of all streamed incremental information. | +| `ConversationItemStreamingFinishedUpdate` | `ItemStreamingFinished` | `response.output_item.done` | Received when a new output item has finished receiving all streamed information. Includes the accumulated data of the delta updates. | +| `ConversationItemStreamingPartDeltaUpdate` | * | * | This update is received when incremental streamed data is available for an in-progress response output item. It combines several server event types, with the specific payload inferrable from which properties are populated or the value of `Kind` on the update. Some streamed conversation items can consistent of multiple content parts; in this situation, the `ContentPartIndex` will distinguish between inner content parts and individual `ConversationItemStreamingPartFinishedUpdates` instances will be raised per content part. | +| | `ItemContentPartStarted` | `response.content_part.added` | | +| | `ItemStreamingPartAudioDelta` | `response.audio.delta` | | +| | `ItemStreamingPartAudioTranscriptionDelta` | `response.audio_transcript.delta` | | +| | `ItemStreamingPartTextDelta` | `response.text.delta` | | +| | `ItemStreamingFunctionCallArgumentsDelta` | `response.function_call_arguments.delta` | | +| `ConversationItemStreamingPartFinishedUpdate` | * | * | Received when an individual component of a streamed conversation item, such as a content part, has finished receiving all streamed data. In many circumstances, using the superset of information available in `ConversationItemStreamingFinishedUpdate` is adequate; this update simply provides further granularities in instances where multiple item components are streamed. | +| | `ItemStreamingFunctionCallArgumentsFinished` | `response.function_call_arguments.done` | | +| | `ItemContentPartFinished` | `response.content_part.done` | | + +**Raw/protocol update usage** + +In addition to being downcastable into derived types that encapsulate command-specific data, each `ConversationUpdate` also exposes a generic `BinaryData` instance via the `GetRawContent()` method, which will provide the direct JSON payload present in the message. + +```csharp +await foreach (ConversationUpdate update in conversation.ReceiveUpdatesAsync()) +{ + Console.WriteLine(message.GetRawContent().Content.ToString()); +} +``` + +Together with the use of `SendCommandAsync(BinaryData)`, \ No newline at end of file diff --git a/dotnet/samples/console-from-file/README.md b/dotnet/samples/console-from-file/README.md index 9c931ba..5e36507 100644 --- a/dotnet/samples/console-from-file/README.md +++ b/dotnet/samples/console-from-file/README.md @@ -54,12 +54,12 @@ A `/realtime` connection session is managed via the `RealtimeConversationSession Calling `AddItemAsync()` on `RealtimeConversationSession` allows adding non-audio (e.g. text) content as well as establishing conversation history or few-shot examples for model inference to use. As demonstrated further in the sample, this method is also the mechanism used to provide responses to tool calls. -`RealtimeConversationSession`'s `SendAudioAsync(Stream)` method will automatically chunk and transmit audio data from a provided stream. Alternatively, the `SendAudioAsync(BinaryData)` method allows individual audio message transmissions. Because commands are sent and received in parallel, it's not necessary to `await` or otherwise block on audio transmission; the sample application goes directly into the message receipt processing. +`RealtimeConversationSession`'s `SendInputAudioAsync(Stream)` method will automatically chunk and transmit audio data from a provided stream. Alternatively, the `SendInputAudioAsync(BinaryData)` method allows individual audio message transmissions. Because commands are sent and received in parallel, it's not necessary to `await` or otherwise block on audio transmission; the sample application goes directly into the message receipt processing. -`RealtimeConversationSession`'s `ReceiveUpdatesAsync()` method provides an `IAsyncEnumerable` of `ConversationUpdate` instances, each representing a single received command from the `/realtime` endpoint. The `ConversationUpdateKind` enumeration on the `UpdateKind` property of the `ConversationUpdate` type maps directly to the corresponding `type` in the wire protocol; these, in turn, also have a down-cast, concrete derived type of the abstract `ConversationUpdate`, e.g. `ConversationResponseStartedUpdate` for `response.created` and `ConversationItemFinishedUpdate` for `conversation.item.done`. These down-cast types can be cast via `as` or `is` to gain access to command-specific data, e.g. `(update as ConversationAudioTranscriptDeltaUpdate).Delta`. +`RealtimeConversationSession`'s `ReceiveUpdatesAsync()` method provides an `IAsyncEnumerable` of `ConversationUpdate` instances, each representing a single received command from the `/realtime` endpoint. The `ConversationUpdateKind` enumeration on the `Kind` property of the `ConversationUpdate` type maps directly to the corresponding `type` in the wire protocol; these, in turn, also have a down-cast, concrete derived type of the abstract `ConversationUpdate`, e.g. `ConversationResponseStartedUpdate` for `response.created`. ## Advanced use -The strongly typed surface for `RealtimeConversationSession` is under active development and may not adequately expose all details of the wire protocol, particularly as commands continue to evolve. It supports passthrough use of request messages via `SendCommandAsync(BinaryData)` (allowing arbitrary JSON to be sent) and the raw JSON of each message may be retrieved by serializing each `ConversationUpdate` instance via `System.ClientModel.Primitives.ModelReaderWriter.Write(update)`. In this manner, `RealtimeConversationSession` may be treated as a low-level WebSocket message client for `/realtime`. +The strongly typed surface for `RealtimeConversationSession` is under active development and may not yet accurately reflect every detail of the wire protocol. It supports passthrough use of request messages via `SendCommandAsync(BinaryData)` (allowing arbitrary JSON to be sent) and the raw JSON of each message may be retrieved by serializing each `ConversationUpdate` instance via `ConversationUpdate.GetRawContent()` or `System.ClientModel.Primitives.ModelReaderWriter.Write(update)`. In this manner, `RealtimeConversationSession` may be treated as a low-level WebSocket message client for `/realtime`. For direct observability of WebSocket traffic as it's sent and received, `RealtimeConversationClient` provides `OnSendingCommand` and `OnReceivingCommand` event handlers. \ No newline at end of file