Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object_score: Support Azure Fabric OAuth Provider #6382

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

RobinLin666
Copy link
Contributor

@RobinLin666 RobinLin666 commented Sep 11, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

In Azure Fabric, we use token service to get user access token, for supporting long reading and writing operation and auto refresh access token, we implement this.

What changes are included in this PR?

This pull request introduces significant enhancements to the Azure integration within the object_store module, including the implementation of a new FabricTokenOAuthProvider. These changes aim to improve the authentication mechanism and add support for fabric token services.

Azure Builder Enhancements:

  • Added new fields to MicrosoftAzureBuilder to support fabric token services, including fabric_token_service_url, fabric_workload_host, fabric_session_token, and fabric_cluster_identifier. (object_store/src/azure/builder.rs object_store/src/azure/builder.rsR175-R182)
  • Updated AzureConfigKey to include new configuration keys related to fabric token services. (object_store/src/azure/builder.rs object_store/src/azure/builder.rsR347-R374)
  • Modified impl AsRef<str> and impl FromStr for AzureConfigKey to handle new fabric token service keys. (object_store/src/azure/builder.rs [1] [2]
  • Enhanced MicrosoftAzureBuilder to set and get the new fabric token service-related fields. (object_store/src/azure/builder.rs [1] [2]
  • Added logic to MicrosoftAzureBuilder to create a FabricTokenOAuthProvider if fabric token service fields are provided. (object_store/src/azure/builder.rs object_store/src/azure/builder.rsR919-R942)

Credential Enhancements:

These changes collectively enhance the Azure integration by supporting more complex authentication mechanisms, particularly for environments utilizing fabric token services.

Are there any user-facing changes?

After that, we can set some environment variables to make it auto refresh access token in Fabric Notebook.

# For Fabric Spark Notebook
import os
import urllib.parse
workload_endpoint = urllib.parse.urlparse(f"{spark.conf.get('trident.lakehouse.tokenservice.endpoint')}/access")
os.environ['azure_fabric_token_service_url'.upper()] = f"https://{spark.conf.get('spark.tokenServiceEndpoint')}/api/v1/proxy{workload_endpoint.path}"
os.environ['azure_fabric_workload_host'.upper()] = f"{workload_endpoint.scheme}://{workload_endpoint.hostname}"
os.environ['azure_fabric_session_token'.upper()] = spark.conf.get("trident.session.token")
os.environ['azure_fabric_cluster_identifier'.upper()] = spark.conf.get("spark.synapse.clusteridentifier")
os.environ['azure_storage_token'.upper()] = notebookutils.credentials.getToken("storage")

# For Fabric Python Notebook
import os
from notebookutils.common import configs
import urllib.parse
workload_endpoint = urllib.parse.urlparse(f"{configs.workload_endpoint()}/access")
os.environ['azure_fabric_token_service_url'.upper()] = f"{configs.ts_endpoint()}/api/v1/proxy{workload_endpoint.path}"
os.environ['azure_fabric_workload_host'.upper()] = f"{workload_endpoint.scheme}://{workload_endpoint.hostname}"
os.environ['azure_fabric_session_token'.upper()] = configs.session_token()
os.environ['azure_fabric_cluster_identifier'.upper()] = configs.cluster_identifier()
os.environ['azure_storage_token'.upper()] = notebookutils.credentials.getToken("storage")

Then, user can read/write delta table without storage_option.

from deltalake import DeltaTable
dt = DeltaTable('abfss://xxxx@onelake.dfs.fabric.microsoft.com/LH.Lakehouse/Tables/dbo/test')
df = dt.to_pyarrow_dataset().head(10).to_pandas()

@github-actions github-actions bot added the object-store Object Store Interface label Sep 11, 2024
@tustvold
Copy link
Contributor

The JWT logic seems a little odd to me, are we just using it to decode the expiry? If so could we avoid the additional dependency?

@RobinLin666
Copy link
Contributor Author

The JWT logic seems a little odd to me, are we just using it to decode the expiry? If so could we avoid the additional dependency?

Hi @tustvold Thank you for you review. Yes, because Token Service only returns a JWT token, so I need to decode the expiry. Any advice without dependency?

@tustvold
Copy link
Contributor

https://jwt.io/introduction you should be able to simply split the string, base64 decode the middle chunk and parse the JSON

@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a way to test this, but it looks plausible to me. Thank you.

Perhaps @roeap you might be able to give this one a once over as well?

@RobinLin666
Copy link
Contributor Author

Thanks all, please help to merge the PR if no question.

@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

Let's wait a day or two before merging to see if @roeap has some time to review. This is getting very close.

Thanks for your patience @RobinLin666 and the help @tustvold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants