Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cosmos] changes and sample for vector search control plane update #34882

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### 4.6.1 (Unreleased)

#### Features Added
* Adds vector embedding policy and vector indexing policy. See [PR 34882](https://github.com/Azure/azure-sdk-for-python/pull/34882).
* Added support for using the start time option for change feed query API. See [PR 35090](https://github.com/Azure/azure-sdk-for-python/pull/35090)

#### Breaking Changes
Expand Down
66 changes: 66 additions & 0 deletions sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -628,6 +628,72 @@ as well as containing the list of failed responses for the failed request.

For more information on Transactional Batch, see [Azure Cosmos DB Transactional Batch][cosmos_transactional_batch].

### Private Preview - Vector Embeddings and Vector Indexes
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
We have added new capabilities to utilize vector embeddings and vector indexing for users to leverage vector
search utilizing our Cosmos SDK. These two container-level configurations have to be turned on at the account-level
before you can use them.

Each vector embedding should have a path to the relevant vector field in your items being stored, a supported data type
(float32, int8, uint8), the vector's dimensions (positive int <=1536), and the distance function being used for that embedding.
A sample vector embedding policy would look like this:
```python
vector_embedding_policy = {
"vectorEmbeddings": [
{
"path": "/vector1",
"dataType": "float32",
"dimensions": 1000,
"distanceFunction": "euclidean"
},
{
"path": "/vector2",
"dataType": "int8",
"dimensions": 200,
"distanceFunction": "dotproduct"
},
{
"path": "/vector3",
"dataType": "uint8",
"dimensions": 400,
"distanceFunction": "cosine"
}
]
}
```

Separately, vector indexes have been added to the already existing indexing_policy and only require two fields per index:
the path to the relevant field to be used, and the type of index from the possible options (flat, quantizedFlat, or diskANN).
A sample indexing policy with vector indexes would look like this:
```python
indexing_policy = {
"automatic": True,
"indexingMode": "consistent",
"compositeIndexes": [
[
{"path": "/numberField", "order": "ascending"},
{"path": "/stringField", "order": "descending"}
]
],
"spatialIndexes": [
{"path": "/location/*", "types": [
"Point",
"Polygon"]}
],
"vectorIndexes": [
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
{"path": "/vector1", "type": "flat"},
{"path": "/vector2", "type": "quantizedFlat"},
{"path": "/vector3", "type": "diskANN"}
]
}
```
You would then pass in the relevant policies to your container creation method to ensure these configurations are used by it.
The operation will fail if you pass new vector indexes to your indexing policy but forget to pass in an embedding policy.
```python
database.create_container(id=container_id, partition_key=PartitionKey(path="/id"),
indexing_policy=indexing_policy, vector_embedding_policy=vector_embedding_policy)
```
***Note: vector embeddings and vector indexes CANNOT be edited by container replace operations. They are only available directly through creation.***

## Troubleshooting

### General
Expand Down
13 changes: 12 additions & 1 deletion sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ async def create_container(
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
**kwargs: Any
) -> ContainerProxy:
"""Create a new container with the given ID (name).
Expand Down Expand Up @@ -202,6 +203,9 @@ async def create_container(
:keyword int analytical_storage_ttl: Analytical store time to live (TTL) for items in the container. A value of
None leaves analytical storage off and a value of -1 turns analytical storage on with no TTL. Please
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:returns: A `ContainerProxy` instance representing the new container.
:rtype: ~azure.cosmos.aio.ContainerProxy
Expand Down Expand Up @@ -243,8 +247,10 @@ async def create_container(
if analytical_storage_ttl is not None:
definition["analyticalStorageTtl"] = analytical_storage_ttl
computed_properties = kwargs.pop('computed_properties', None)
if computed_properties:
if computed_properties is not None:
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
definition["computedProperties"] = computed_properties
if vector_embedding_policy is not None:
definition["vectorEmbeddingPolicy"] = vector_embedding_policy

if session_token is not None:
kwargs['session_token'] = session_token
Expand Down Expand Up @@ -278,6 +284,7 @@ async def create_container_if_not_exists(
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a container if it does not exist already.
Expand Down Expand Up @@ -310,6 +317,9 @@ async def create_container_if_not_exists(
:keyword int analytical_storage_ttl: Analytical store time to live (TTL) for items in the container. A value of
None leaves analytical storage off and a value of -1 turns analytical storage on with no TTL. Please
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
simorenoh marked this conversation as resolved.
Show resolved Hide resolved
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:returns: A `ContainerProxy` instance representing the new container.
:rtype: ~azure.cosmos.aio.ContainerProxy
Expand Down Expand Up @@ -338,6 +348,7 @@ async def create_container_if_not_exists(
match_condition=match_condition,
session_token=session_token,
initial_headers=initial_headers,
vector_embedding_policy=vector_embedding_policy,
**kwargs
)

Expand Down
13 changes: 12 additions & 1 deletion sdk/cosmos/azure-cosmos/azure/cosmos/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ def create_container( # pylint:disable=docstring-missing-param
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a new container with the given ID (name).
Expand All @@ -199,6 +200,9 @@ def create_container( # pylint:disable=docstring-missing-param
:keyword List[Dict[str, str]] computed_properties: **provisional** Sets The computed properties for this
container in the Azure Cosmos DB Service. For more Information on how to use computed properties visit
`here: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword Dict[str, Any] vector_embedding_policy: **provisional** The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
:returns: A `ContainerProxy` instance representing the new container.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container creation failed.
:rtype: ~azure.cosmos.ContainerProxy
Expand Down Expand Up @@ -239,8 +243,10 @@ def create_container( # pylint:disable=docstring-missing-param
if analytical_storage_ttl is not None:
definition["analyticalStorageTtl"] = analytical_storage_ttl
computed_properties = kwargs.pop('computed_properties', None)
if computed_properties:
if computed_properties is not None:
definition["computedProperties"] = computed_properties
if vector_embedding_policy is not None:
definition["vectorEmbeddingPolicy"] = vector_embedding_policy

if session_token is not None:
kwargs['session_token'] = session_token
Expand Down Expand Up @@ -281,6 +287,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
etag: Optional[str] = None,
match_condition: Optional[MatchConditions] = None,
analytical_storage_ttl: Optional[int] = None,
vector_embedding_policy: Optional[Dict[str, Any]] = None,
**kwargs: Any
) -> ContainerProxy:
"""Create a container if it does not exist already.
Expand Down Expand Up @@ -309,6 +316,9 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
:keyword List[Dict[str, str]] computed_properties: **provisional** Sets The computed properties for this
container in the Azure Cosmos DB Service. For more Information on how to use computed properties visit
`here: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword Dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
:returns: A `ContainerProxy` instance representing the container.
:raises ~azure.cosmos.exceptions.CosmosHttpResponseError: The container read or creation failed.
:rtype: ~azure.cosmos.ContainerProxy
Expand Down Expand Up @@ -339,6 +349,7 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
match_condition=match_condition,
session_token=session_token,
initial_headers=initial_headers,
vector_embedding_policy=vector_embedding_policy,
**kwargs
)

Expand Down
62 changes: 62 additions & 0 deletions sdk/cosmos/azure-cosmos/samples/index_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -633,6 +633,65 @@ def perform_multi_orderby_query(db):
print("Entity doesn't exist")


def use_vector_embedding_policy(db):
try:
delete_container_if_exists(db, CONTAINER_ID)

# Create a container with vector embedding policy and vector indexes
indexing_policy = {
"vectorIndexes": [
{"path": "/vector1", "type": "flat"},
{"path": "/vector2", "type": "quantizedFlat"},
{"path": "/vector3", "type": "diskANN"}
]
}
vector_embedding_policy = {
"vectorEmbeddings": [
{
"path": "/vector1",
"dataType": "float32",
"dimensions": 1000,
"distanceFunction": "euclidean"
},
{
"path": "/vector2",
"dataType": "int8",
"dimensions": 200,
"distanceFunction": "dotproduct"
},
{
"path": "/vector3",
"dataType": "uint8",
"dimensions": 400,
"distanceFunction": "cosine"
}
]
}

created_container = db.create_container(
id=CONTAINER_ID,
partition_key=PARTITION_KEY,
indexing_policy=indexing_policy,
vector_embedding_policy=vector_embedding_policy
)
properties = created_container.read()
print(created_container)

print("\n" + "-" * 25 + "\n9. Container created with vector embedding policy and vector indexes")
print_dictionary_items(properties["indexingPolicy"])
print_dictionary_items(properties["vectorEmbeddingPolicy"])

# TODO: add rest of sample once query work is done

# Cleanup
db.delete_container(created_container)
print("\n")
except exceptions.CosmosResourceExistsError:
print("Entity already exists")
except exceptions.CosmosResourceNotFoundError:
print("Entity doesn't exist")


def run_sample():
try:
client = obtain_client()
Expand Down Expand Up @@ -663,6 +722,9 @@ def run_sample():
# 8. Perform Multi Orderby queries using composite indexes
perform_multi_orderby_query(created_db)

# 9. Create and use a vector embedding policy
use_vector_embedding_policy(created_db)

except exceptions.AzureError as e:
raise e

Expand Down
63 changes: 61 additions & 2 deletions sdk/cosmos/azure-cosmos/samples/index_management_async.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,6 +628,65 @@ async def perform_multi_orderby_query(db):
print("Entity doesn't exist")


async def use_vector_embedding_policy(db):
try:
await delete_container_if_exists(db, CONTAINER_ID)

# Create a container with vector embedding policy and vector indexes
indexing_policy = {
"vectorIndexes": [
{"path": "/vector1", "type": "flat"},
{"path": "/vector2", "type": "quantizedFlat"},
{"path": "/vector3", "type": "diskANN"}
]
}
vector_embedding_policy = {
"vectorEmbeddings": [
{
"path": "/vector1",
"dataType": "float32",
"dimensions": 1000,
"distanceFunction": "euclidean"
},
{
"path": "/vector2",
"dataType": "int8",
"dimensions": 200,
"distanceFunction": "dotproduct"
},
{
"path": "/vector3",
"dataType": "uint8",
"dimensions": 400,
"distanceFunction": "cosine"
}
]
}

created_container = await db.create_container(
id=CONTAINER_ID,
partition_key=PARTITION_KEY,
indexing_policy=indexing_policy,
vector_embedding_policy=vector_embedding_policy
)
properties = await created_container.read()
print(created_container)

print("\n" + "-" * 25 + "\n9. Container created with vector embedding policy and vector indexes")
print_dictionary_items(properties["indexingPolicy"])
print_dictionary_items(properties["vectorEmbeddingPolicy"])

# TODO: add rest of sample once query work is done

# Cleanup
await db.delete_container(created_container)
print("\n")
except exceptions.CosmosResourceExistsError:
print("Entity already exists")
except exceptions.CosmosResourceNotFoundError:
print("Entity doesn't exist")


async def run_sample():
try:
async with obtain_client() as client:
Expand Down Expand Up @@ -658,8 +717,8 @@ async def run_sample():
# 8. Perform Multi Orderby queries using composite indexes
await perform_multi_orderby_query(created_db)

print('Sample done, cleaning up sample-generated data')
await client.delete_database(DATABASE_ID)
# 9. Create and use a vector embedding policy
await use_vector_embedding_policy(created_db)

except exceptions.AzureError as e:
raise e
Expand Down
Loading