Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EventHub] Fix keyerror issue in BlobCheckpointStore #15752

Merged

Conversation

yunhaoling
Copy link
Contributor

@yunhaoling yunhaoling commented Dec 11, 2020

Addressing issue: #13060
stack overflow issue: https://stackoverflow.com/questions/63354884/azure-eventhubs-python-checkpointing-with-blob-storage-keyerror-issue-in-ev

also added a test resource storage blob v2 with data lake enabled to verify the fix


--- echoing the context here ---

The root cause of KeyError is that the list_blobs functionality when called on a v2 storage blob with data lake enabled (hierarchical namespace) will not only get the per-partition checkpoint/ownership but also get the parent blob node which contains no metadata.

To illustrate this better, let's say we have the following blob structures:

- fullqualifiednamespace (directory)
  - eventhubname (directory)
    - $default (directory)
        - ownership (directory)
          - 0 (blob)
          - 1 (blob)
          ...

in v2 storage with data lake enabled (hierarchical namespace), when the code was using prefix
{<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership to search for blobs, the {<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership directory itself would also be returned containing no metadata leading to the KeyError when we're trying to extract information.

What we want is the per-partition blob, the the fix is easy: we add a / at the end of the prefix search string such that list_blobs won't return the parent node.
{<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership/

(Checkpoint would encounter the same problem)

@yunhaoling
Copy link
Contributor Author

/azp run python - eventhub - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yunhaoling yunhaoling merged commit b9eb382 into Azure:master Dec 22, 2020
@yunhaoling yunhaoling mentioned this pull request Jan 7, 2021
rakshith91 pushed a commit to rakshith91/azure-sdk-for-python that referenced this pull request Jan 8, 2021
@yunhaoling yunhaoling deleted the yuling/eh/storage-checkpoint-none branch January 13, 2021 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants