Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main into live #44648

Merged
merged 3 commits into from
Feb 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 15 additions & 14 deletions docs/ai/ai-extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ ms.author: alexwolf

# Unified AI building blocks for .NET using Microsoft.Extensions.AI

The .NET ecosystem provides abstractions for integrating AI services into .NET applications and libraries using the [`Microsoft.Extensions.AI`](https://www.nuget.org/packages/Microsoft.Extensions.AI) and [`Microsoft.Extensions.AI.Abstractions`](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions) libraries. The .NET team also enhanced the core `Microsoft.Extensions.*` libraries with these abstractions for .NET Generative AI applications and libraries. In the sections ahead, you learn:
The .NET ecosystem provides abstractions for integrating AI services into .NET applications and libraries using the <xref:Microsoft.Extensions.AI> and [`Microsoft.Extensions.AI.Abstractions`](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions) libraries. The .NET team also enhanced the core `Microsoft.Extensions.*` libraries with these abstractions for .NET generative AI applications and libraries. In the sections ahead, you learn:

- Core concepts and capabilities of the `Microsoft.Extensions.AI` libraries.
- How to work with AI abstractions in your apps and the benefits they offer.
Expand All @@ -20,7 +20,7 @@ For more information, see [Introduction to Microsoft.Extensions.AI](../core/exte

## What is the Microsoft.Extensions.AI library?

`Microsoft.Extensions.AI` is a set of core .NET libraries created in collaboration with developers across the .NET ecosystem, including Semantic Kernel. These libraries provide a unified layer of C# abstractions for interacting with AI services, such as small and large language models (SLMs and LLMs), embeddings, and middleware.
<xref:Microsoft.Extensions.AI> is a set of core .NET libraries created in collaboration with developers across the .NET ecosystem, including Semantic Kernel. These libraries provide a unified layer of C# abstractions for interacting with AI services, such as small and large language models (SLMs and LLMs), embeddings, and middleware.

:::image type="content" source="media/ai-extensions/meai-architecture-diagram.png" lightbox="media/ai-extensions/meai-architecture-diagram.png" alt-text="An architectural diagram of the AI extensions libraries.":::

Expand All @@ -40,18 +40,18 @@ For example, the `IChatClient` interface allows consumption of language models f

```csharp
IChatClient client =
    environment.IsDevelopment ?
    new OllamaChatClient(...) :
    new AzureAIInferenceChatClient(...);
    environment.IsDevelopment ?
    new OllamaChatClient(...) :
    new AzureAIInferenceChatClient(...);
```

Then, regardless of the provider you're using, you can send requests as follows:

```csharp
var response = await chatClient.CompleteAsync(
      "Translate the following text into Pig Latin: I love .NET and AI");
var response = await chatClient.CompleteAsync(
      "Translate the following text into Pig Latin: I love .NET and AI");

Console.WriteLine(response.Message);
Console.WriteLine(response.Message);
```

These abstractions allow for idiomatic C# code for various scenarios with minimal code changes, whether you're using different services for development and production, addressing hybrid scenarios, or exploring other service providers.
Expand All @@ -69,17 +69,17 @@ In the future, implementations of these `Microsoft.Extensions.AI` abstractions w

## Middleware implementations for AI services

Connecting to and using AI services is just one aspect of building robust applications. Production-ready applications require additional features like telemetry, logging, and tool calling capabilities. The `Microsoft.Extensions.AI` abstractions enable you to easily integrate these components into your applications using familiar patterns.
Connecting to and using AI services is just one aspect of building robust applications. Production-ready applications require additional features like telemetry, logging, and tool-calling capabilities. The `Microsoft.Extensions.AI` abstractions enable you to easily integrate these components into your applications using familiar patterns.

The following sample demonstrates how to register an OpenAI `IChatClient`. `IChatClient` allows you to attach the capabilities in a consistent way across various providers.

```csharp
app.Services.AddChatClient(builder => builder
app.Services.AddChatClient(builder => builder
    .UseLogging()
.UseFunctionInvocation()
.UseDistributedCache()   
.UseOpenTelemetry()
    .Use(new OpenAIClient(...)).AsChatClient(...));
.UseFunctionInvocation()
.UseDistributedCache()   
.UseOpenTelemetry()
    .Use(new OpenAIClient(...)).AsChatClient(...));
```

The capabilities demonstrated in this snippet are included in the `Microsoft.Extensions.AI` library, but they are only a small subset of the capabilities that can be layered in with this approach. .NET developers are able to expose many types of middleware to create powerful AI functionality.
Expand All @@ -92,6 +92,7 @@ You can start building with `Microsoft.Extensions.AI` in the following ways:
- **Service Consumers**: If you're developing libraries that consume AI services, use the abstractions instead of hardcoding to a specific AI service. This approach gives your consumers the flexibility to choose their preferred service.
- **Application Developers**: Use the abstractions to simplify integration into your apps. This enables portability across models and services, facilitates testing and mocking, leverages middleware provided by the ecosystem, and maintains a consistent API throughout your app, even if you use different services in different parts of your application.
- **Ecosystem Contributors**: If you're interested in contributing to the ecosystem, consider writing custom middleware components.

To get started, see the samples in the [dotnet/ai-samples](https://aka.ms/meai-samples) GitHub repository.

For an end-to-end sample using `Microsoft.Extensions.AI`, see [eShopSupport](https://github.com/dotnet/eShopSupport).
Expand Down
10 changes: 6 additions & 4 deletions docs/ai/azure-ai-services-authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ var kernel = builder.Build();

Using keys is a straightforward option, but this approach should be used with caution. Keys aren't the recommended authentication option because they:

- Don't follow [the principle of least privilege](/entra/identity-platform/secure-least-privileged-access)—they provide elevated permissions regardless of who uses them or for what task.
- Don't follow [the principle of least privilege](/entra/identity-platform/secure-least-privileged-access). They provide elevated permissions regardless of who uses them or for what task.
- Can accidentally be checked into source control or stored in unsafe locations.
- Can easily be shared with or sent to parties who shouldn't have access.
- Often require manual administration and rotation.
Expand All @@ -47,19 +47,21 @@ Instead, consider using [Microsoft Entra ID](/#explore-microsoft-entra-id) for a

Microsoft Entra ID is a cloud-based identity and access management service that provides a vast set of features for different business and app scenarios. Microsoft Entra ID is the recommended solution to connect to Azure OpenAI and other AI services and provides the following benefits:

- Key-less authentication using [identities](/entra/fundamentals/identity-fundamental-concepts).
- Role-based-access-control (RBAC) to assign identities the minimum required permissions.
- Keyless authentication using [identities](/entra/fundamentals/identity-fundamental-concepts).
- Role-based access control (RBAC) to assign identities the minimum required permissions.
- Can use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library to detect [different credentials across environments](/dotnet/api/azure.identity.defaultazurecredential) without requiring code changes.
- Automatically handles administrative maintenance tasks such as rotating underlying keys.

The workflow to implement Microsoft Entra authentication in your app generally includes the following:
The workflow to implement Microsoft Entra authentication in your app generally includes the following steps:

- Local development:

1. Sign-in to Azure using a local dev tool such as the Azure CLI or Visual Studio.
1. Configure your code to use the [`Azure.Identity`](/dotnet/api/overview/azure/identity-readme) client library and `DefaultAzureCredential` class.
1. Assign Azure roles to the account you signed-in with to enable access to the AI service.

- Azure-hosted app:

1. Deploy the app to Azure after configuring it to authenticate using the `Azure.Identity` client library.
1. Assign a [managed identity](/entra/identity/managed-identities-azure-resources/overview) to the Azure-hosted app.
1. Assign Azure roles to the managed identity to enable access to the AI service.
Expand Down
69 changes: 33 additions & 36 deletions docs/ai/conceptual/understanding-tokens.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,31 @@ description: "Understand how large language models (LLMs) use tokens to analyze
author: haywoodsloan
ms.topic: concept-article
ms.date: 12/19/2024

#customer intent: As a .NET developer, I want understand how large language models (LLMs) use tokens so I can add semantic analysis and text generation capabilities to my .NET projects.

---

# Understand tokens

Tokens are words, character sets, or combinations of words and punctuation that are generated by large language models (LLMs) when they decompose text. Tokenization is the first step in training. The LLM analyzes the semantic relationships between tokens, such as how commonly they're used together or whether they're used in similar contexts. After training, the LLM uses those patterns and relationships to generate a sequence of output tokens based on the input sequence.

## Turning text into tokens
## Turn text into tokens

The set of unique tokens that an LLM is trained on is known as its _vocabulary_.

For example, consider the following sentence:

> I heard a dog bark loudly at a cat
> `I heard a dog bark loudly at a cat`

This text could be tokenized as:

- I
- heard
- a
- dog
- bark
- loudly
- at
- a
- cat
- `I`
- `heard`
- `a`
- `dog`
- `bark`
- `loudly`
- `at`
- `a`
- `cat`

By having a sufficiently large set of training text, tokenization can compile a vocabulary of many thousands of tokens.

Expand All @@ -47,37 +44,37 @@ For example, the GPT models, developed by OpenAI, use a type of subword tokeniza

There are benefits and disadvantages to each tokenization method:

| Token size | Pros | Cons |
| -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Smaller tokens (character or subword tokenization) | - Enables the model to handle a wider range of inputs, such as unknown words, typos, or complex syntax.<br>- Might allow the vocabulary size to be reduced, requiring fewer memory resources. | - A given text is broken into more tokens, requiring additional computational resources while processing<br>- Given a fixed token limit, the maximum size of the model's input and output is smaller |
| Larger tokens (word tokenization) | - A given text is broken into fewer tokens, requiring fewer computational resources while processing.<br>- Given the same token limit, the maximum size of the model's input and output is larger. | - Might cause an increased vocabulary size, requiring more memory resources.<br>- Can limit the models ability to handle unknown words, typos, or complex syntax. |
| Token size | Pros | Cons |
|----------------------------------------------------|------|------|
| Smaller tokens (character or subword tokenization) | - Enables the model to handle a wider range of inputs, such as unknown words, typos, or complex syntax.<br>- Might allow the vocabulary size to be reduced, requiring fewer memory resources. | - A given text is broken into more tokens, requiring additional computational resources while processing.<br>- Given a fixed token limit, the maximum size of the model's input and output is smaller. |
| Larger tokens (word tokenization) | - A given text is broken into fewer tokens, requiring fewer computational resources while processing.<br>- Given the same token limit, the maximum size of the model's input and output is larger. | - Might cause an increased vocabulary size, requiring more memory resources.<br>- Can limit the models ability to handle unknown words, typos, or complex syntax. |

## How LLMs use tokens

After the LLM completes tokenization, it assigns an ID to each unique token.

Consider our example sentence:

> I heard a dog bark loudly at a cat
> `I heard a dog bark loudly at a cat`

After the model uses a word tokenization method, it could assign token IDs as follows:

- I (1)
- heard (2)
- a (3)
- dog (4)
- bark (5)
- loudly (6)
- at (7)
- a (the "a" token is already assigned an ID of 3)
- cat (8)
- `I` (1)
- `heard` (2)
- `a` (3)
- `dog` (4)
- `bark` (5)
- `loudly` (6)
- `at` (7)
- `a` (the "a" token is already assigned an ID of 3)
- `cat` (8)

By assigning IDs, text can be represented as a sequence of token IDs. The example sentence would be represented as [1, 2, 3, 4, 5, 6, 7, 3, 8]. The sentence "I heard a cat" would be represented as [1, 2, 3, 8].
By assigning IDs, text can be represented as a sequence of token IDs. The example sentence would be represented as [1, 2, 3, 4, 5, 6, 7, 3, 8]. The sentence "`I heard a cat`" would be represented as [1, 2, 3, 8].

As training continues, the model adds any new tokens in the training text to its vocabulary and assigns it an ID. For example:

- meow (9)
- run (10)
- `meow` (9)
- `run` (10)

The semantic relationships between the tokens can be analyzed by using these token ID sequences. Multi-valued numeric vectors, known as [embeddings](embeddings.md), are used to represent these relationships. An embedding is assigned to each token based on how commonly it's used together with, or in similar contexts to, the other tokens.

Expand All @@ -91,9 +88,9 @@ Output generation is an iterative operation. The model appends the predicted tok

LLMs have limitations regarding the maximum number of tokens that can be used as input or generated as output. This limitation often causes the input and output tokens to be combined into a maximum context window. Taken together, a model's token limit and tokenization method determine the maximum length of text that can be provided as input or generated as output.

For example, consider a model that has a maximum context window of 100 tokens. The model processes our example sentences as input text:
For example, consider a model that has a maximum context window of 100 tokens. The model processes the example sentences as input text:

> I heard a dog bark loudly at a cat
> `I heard a dog bark loudly at a cat`

By using a word-based tokenization method, the input is nine tokens. This leaves 91 **word** tokens available for the output.

Expand All @@ -107,6 +104,6 @@ Generative AI services might also be limited regarding the maximum number of tok

## Related content

- [How Generative AI and LLMs work](how-genai-and-llms-work.md)
- [Understanding embeddings](embeddings.md)
- [Working with vector databases](vector-databases.md)
- [How generative AI and LLMs work](how-genai-and-llms-work.md)
- [Understand embeddings](embeddings.md)
- [Work with vector databases](vector-databases.md)
Loading
Loading