-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for identity based access to Azure using DefaultAzureCredential #18931
Comments
I'm 99% sure that you'll need to inquire with Object Store about this since that's what polars uses for cloud connectivity. As an aside, have you tried this? |
@deanm0000 many thanks for the link to the azure page, I didn't know of the possibility of an user delegation SAS! Let me first explore other scenarios and in case get in touch with the Object Store devs. I will come back to this. Or, if you think it's not worth it, you can close this ticket. |
I think it'd be good to put a note in the docs about it but I don't have any say on which PRs get accepted so you'll just have to try and see what happens. |
How-to summaryI leave some notes for whoever ends up here with the same problem as mine
Hope it will be helpful for someone |
Hi, My two cents here. If this is being implemented: https://docs.pola.rs/api/python/dev/reference/api/polars.CredentialProvider.html The easier way would be to retrieve the token ID via azure identity and then pass it as auth method to the storage options. Something like: def get_chained_credentials():
"""Creates and returns a chained token credential for Azure authentication.
This function initializes a `ChainedTokenCredential` instance that combines
a managed identity credential and a default credential.
Returns
-------
chained_creds : ChainedTokenCredential
A chained token credential combining `ManagedIdentityCredential` and
`DefaultAzureCredential`, allowing applications to authenticate using
either managed identity or default credentials.
"""
try:
chained_creds = ChainedTokenCredential(
AzureCliCredential(),
ManagedIdentityCredential(client_id=CLIENT_ID),
)
return chained_creds
except Exception as e:
raise e With this we can get a chained token credential with different auth methods but use always the token auth in polars. |
I have kind of the opposite problem. On my local machine, I've tried adding extending the |
@daviewales could you ensure that the |
I will close this issue as the latest versions of polars now automatically use Closed as completed via #20384 |
Thanks @nameexhaustion, I believe that the from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential(exclude_managed_identity_credential=True) However, it's not clear to me how to tell Polars to use this. I've tried passing |
@daviewales The credential provider given must also return an expiry time. It should work if you create a custom function to wrap it: def credential_provider():
credential = DefaultAzureCredential(exclude_managed_identity_credential=True)
token = credential.get_token()
return {
"bearer_token": token.token,
}, token.expires_on
q = pl.scan_parquet(..., credential_provider=credential_provider) For reference, this is what we have internally: polars/py-polars/polars/io/cloud/credential_provider.py Lines 237 to 241 in 96a2d01
|
Thanks @nameexhaustion, that got me most of the way there. This missing piece of the puzzle is that
So, the minimal working definition of a def credential_provider():
credential = DefaultAzureCredential(exclude_managed_identity_credential=True)
token = credential.get_token("https://storage.azure.com/.default")
return {"bearer_token": token.token}, token.expires_on You can then use it as expected: pl.scan_parquet(
'az://container/table.parquet',
storage_options={'account_name': 'myaccount'},
credential_provider=credential_provider
) |
Description
As pointed out in #11520, it is not possible to use
anon=False
instorage_options
when reading data from the cloud.I work with Azure, and as far as I know the only possibility to access the data using the DefaultAzureCredential is in this way (see https://stackoverflow.com/questions/74136425/connecting-to-azure-storage-account-to-read-parquet-file-via-managed-identity-us).
I could not find any information in the documentation about alternative ways, and AzureConfigKey does not seem to support this feature.
As identity based access is the golden standard in security, I think that it would be very important to support this feature.
The text was updated successfully, but these errors were encountered: