Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subpartitioning Python Cosmos DB SDK #31121

Merged
merged 30 commits into from
Oct 10, 2023
Merged

Conversation

bambriz
Copy link
Member

@bambriz bambriz commented Jul 13, 2023

Description

This PR adds subpartioning to the python sdk (also reffered to as hiearchial partitioning or multihash partitioning). This PR also includes tests and samples.

In order to activate subpartitioning you have to define the partition key as MultiHash and pass a list to the paths.

ex:

container = database.create_container(
        id=container_name, partition_key=PartitionKey(path=["/state", "/city", "/zipcode"], kind="MultiHash")
    )

container operations must pass in partition keys that match the list you used to define the partition key path.

ex:

 document_definition = {'id': 'document',
                               'key': 'value',
                               'state': 'WA',
                               'city': 'Redmond',
                               'zipcode': '98052'}

        created_document = created_collection.create_item(
            body=document_definition
        )

or

upserted_document = created_collection.upsert_item(body=dict(id='document2',
                                                    key='value2', state='GA',
                                                    city='Atlanta', 
                                                    zipcode='30363'))

Additionally this also adds support for prefix partition queries. This allows one to use an incomplete partition key to perform a query.
example:

container = database.create_container(id="container",
                                               partition_key=PartitionKey(path=["/ZipCode", "/City", "/Type"],
                                                                          kind="MultiHash"))
container.create_item(body=dict(id="document1", ZipCode="500026", City="Secunderabad", Type="Residence"))
container.create_item(body=dict(id="document2", ZipCode="15232", City="Pittsbirgh", Type="Business"))
container.create_item(body=dict(id="document3", ZipCode="11790", City="Stonybrook", Type="Government"))

q_ret = list(container.query_items(query='SELECT * from c', partition_key=["500026"]))

The above example will only return a single item despite using 'SELECT * from c' as the query. As before with all subpartitioning operations you have to pass in a list for the partition key value.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

bambriz added 2 commits July 3, 2023 18:08
adding subpartitioning
Fixes some edge cases. This also adds tests for subpartitioning CRUD operations that match Java SDK as well as some python specific edge cases. This also adds samples for subpartitioning in python.
@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Pull request contains merge conflicts.

@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@azure-sdk
Copy link
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

bambriz added 3 commits July 13, 2023 09:57
remove line of code that was used for testing
update changelog to include new feature
@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 31121 in repo Azure/azure-sdk-for-python

fixes for pylint issues
@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

removes left over debugging code on subpartition test
@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 31121 in repo Azure/azure-sdk-for-python

@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 31121 in repo Azure/azure-sdk-for-python

@bambriz
Copy link
Member Author

bambriz commented Jul 13, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 31121 in repo Azure/azure-sdk-for-python

changing get epk range for prefix partition key to be private
@bambriz
Copy link
Member Author

bambriz commented Oct 3, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@NaluTripician NaluTripician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ealsur ealsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding off since we found gaps on the supported scenarios, mainly partial PK spanning multiple partitions

In the case of large databases, a prefix query involving a container with subpartitioning may involve multiple physical partitions. This allows for a prefix query to properly query items from all the partitions that contain the prefix partition keys.
@bambriz
Copy link
Member Author

bambriz commented Oct 6, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

This commit adds better support for the case of a prefix query needing to query multiple physical partitions. It will query each partition with the needed partition key range for each physical partition. New tests were also added to test this functionality.
@bambriz
Copy link
Member Author

bambriz commented Oct 9, 2023

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@ealsur ealsur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits and questions. Please make sure to validate this pointing to the account we have for subpartitioning testing

Added a comment explaining the fourth case of what EPK sub range could equal. In that case the epk sub range equals the feed range EPK as it is within the range of a physical partition without spanning the entire physical partition.
@simorenoh
Copy link
Member

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot Bryan!

@bambriz bambriz merged commit 37b322f into Azure:main Oct 10, 2023
@Th3OnlyN00b
Copy link

Any idea when this is going into a non-prerelease? Trying to set up pipelines for the changeover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants