Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Checkpoint Store] Exception KeyError('ownerid') #13060

Closed
YijunXieMS opened this issue Aug 12, 2020 · 4 comments
Closed

[Checkpoint Store] Exception KeyError('ownerid') #13060

YijunXieMS opened this issue Aug 12, 2020 · 4 comments
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team

Comments

@YijunXieMS
Copy link
Contributor

YijunXieMS commented Aug 12, 2020

  • Package Name: azure-eventhub-checkpointstoreblob-aio
  • Package Version: 1.1.0
  • Operating System: N/A
  • Python Version: 3.x

Describe the bug
A user reported an error in stackoverflow.
The following error message was printed out when the checkpoint store is used.

EventProcessor instance 'xxxxxxxxxxx' of eventhub consumer group . An error occurred while load-balancing and claiming ownership. The exception is KeyError('ownerid'). Retrying after xxxx seconds

According to the user, all blobs under the container have metadata "ownerid". This error should only happen when a retrieved blob doesn't have "ownerid" in metadata, which isn't supposed to happen. Currently it's not clear what caused the "ownerid" not exist in the blob metadata.

To Reproduce
Not known yet.

Expected behavior
No error happens

@YijunXieMS YijunXieMS added bug This issue requires a change to an existing behavior in the product in order to be resolved. Event Hubs Client This issue points to a problem in the data-plane of the library. labels Aug 12, 2020
@YijunXieMS YijunXieMS added this to the [2020] September milestone Aug 12, 2020
@YijunXieMS YijunXieMS self-assigned this Aug 12, 2020
@YijunXieMS
Copy link
Contributor Author

This problem might be related to storage service. The user said in stackoverflow:

I might have found what's causing the issue; I tested in fresh a venv with python 3.7.7 64bit installed. I created two new storage accounts from scratch (azure general storage v1 and v2) and made new containers in each. The issue only occured (and kept occuring) when I used azure storage V2 for checkpointing; Everything ran fine when I instead connected to the azure storage V1 account. This might be worth looking into.

Then I tested both azure storage general v1 and v2. But I didn't see any problems in both versions.
Asked the user to tell more information about his storage account settings and his region to try again.

@YijunXieMS YijunXieMS modified the milestones: [2020] September, Backlog Aug 25, 2020
@YijunXieMS YijunXieMS added the customer-reported Issues that are reported by GitHub users external to the Azure organization. label Aug 25, 2020
@lmazuel lmazuel added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Oct 16, 2020
damnedOperator added a commit to damnedOperator/azure-sdk-for-python that referenced this issue Nov 30, 2020
Fix the cause of the error message:
```
EventProcessor instance 'xxxxxxxxxxx' of eventhub <name of my eventhub> consumer group <name of my consumer group>. An error occurred while load-balancing and claiming ownership. The exception is KeyError('ownerid'). Retrying after xxxx seconds
```

Mentioned in Azure#13060
@yunhaoling
Copy link
Contributor

yunhaoling commented Dec 11, 2020

conclusion:

The root cause is that the list_blobs functionality of the storage sdk when called on a v2 storage blob with data lake enabled (hierarchical namespace) will not only get the per-partition checkpoint/ownership but also get the parent blob node which contains no metadata.

To illustrate this better, let's say we have the following blob structures:

- fullqualifiednamespace (directory)
  - eventhubname (directory)
    - $default (directory)
        - ownership (directory)
          - 0 (blob)
          - 1 (blob)
          ...

in v2 storage with data lake enabled (hierarchical namespace), when the code was using prefix
{<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership to search for blobs, the {<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership directory itself would also be returned containing no metadata leading to the KeyError when we're trying to extract information.

What we want is the per-partition blob, the fix is easy: we add a / at the end of the prefix search string such that list_blobs won't return the parent node.
{<fully_qualified_namespace>/<eventhub_name>/<consumer_group>/ownership/

(Checkpoint would encounter the same problem)

@yunhaoling
Copy link
Contributor

@yunhaoling yunhaoling modified the milestones: Backlog, [2021] January Mar 1, 2021
@GuyCarmy
Copy link

GuyCarmy commented Sep 7, 2022

Hi @yunhaoling and @YijunXieMS
Im getting the same error exactly on version 1.1.4.
It started when I have tried upgrading the storage account to premium (since the client doesn't really store anything, but do access the files metadata a lot, using premium will cut the costs and improve the performance!).
I have copied an existing set of blobs for checkpoints and ownerships, I have validated that the metadata has been copied.

I guess something similar to this issue happens because the response from the premium storage account is a bit different?

will appreciate any help :)

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team
Projects
None yet
Development

No branches or pull requests

5 participants