Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Microsoft Fabric / OneLake #4573

Merged
merged 9 commits into from
Aug 11, 2023

Conversation

vmuddassir-msft
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the object-store Object Store Interface label Jul 26, 2023
@vmuddassir-msft vmuddassir-msft changed the title Changes required for onelake-fix issue#1418 fix: add support for microsoft onelake Jul 27, 2023
@vmuddassir-msft
Copy link
Contributor Author

added additional onelake urls to object_store/src/azure/mod.rs for both https and abfss

this is required to fix Issue 1418

@vmuddassir-msft vmuddassir-msft marked this pull request as ready for review August 4, 2023 17:30
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR appears to add support for a mix of different URLs:

Could we possibly get tests for all of these, and ideally a link to some docs that explains why there are 4 different URI schemes and what the differences between them are?

@vmuddassir-msft
Copy link
Contributor Author

vmuddassir-msft commented Aug 8, 2023

@tustvold Unit Tests added to validate the URL schemes that were introduced to enable support for OneLake Fabric URLs.

https://account.blob.fabric.microsoft.com/
https://account.dfs.fabric.microsoft.com/
abfs://account.dsf.fabric.microsoft.com/

OneLake supports both Blob and DFS endpoints, as certain tools opt for utilizing Blob APIs by executing a string replacement, substituting '.dfs' with '.blob'.Also as OneLake supports the ABFS Driver, it also provides support for endpoints using the ABFS format.

Link to Documentation

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly struggling to follow how this is supposed to work, and I don't have a Microsoft fabric, let alone a onelake endpoint setup to test this. Reading the linked doc it would suggest a fundamentally different URI scheme from what this crate expects, gone is the notion of a container and account name, instead replaced with some workspace notion?

Is there some way to interact with onelake using the normal blob storage APIs? Is this documented somewhere? Ideally we would be able to do something similar to how we support ADLS Gen 2, i.e. just treat it as normal blob storage as that is all this crate and its users are concerned with.

Ultimately it would be my dream if Azure just provided a standard blob storage API that worked across all its various products, without strange esoterica like ADLS Gen 2 returning directories (?!!) with no way to turn this off that is compatible with other blob endpoints. The closer we can get to such an API the happier I will be 😄

object_store/src/azure/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thank you for sticking with this, I'm starting to follow how this is supposed to work. The entire onelake workspace is effectively mapped as a "container", with the top-level layers of the hierarchy not being real parts of blob storage, and only responding to HEAD requests. This does feel rather perverse though, so perhaps I am misunderstanding something...

I've left some comments assuming the above is the correct interpretation, but feel free to correct me if I am mistaken. I have limited bandwidth to review stuff so I apologise if I'm asking stupid questions.

object_store/src/azure/mod.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,114 @@
use async_trait::async_trait;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably remove this before merge, but thank you for creating it to show how you are expecting to interact with this system

.build()
.unwrap());

let path = Path::from("17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit that I found rather surprising, I would have naively thought that the corollary for a container would be the files within an item, i.e. the user would just specify SampleCustomerList.csv in the path, with everything else provided at construction time of the builder.

This does appear to be what the docs suggest, so 🤷 To confirm if I were to perform a List of 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/ it would return the full path, i.e. 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv and not SampleCustomerList.csv?

Copy link
Contributor Author

@vmuddassir-msft vmuddassir-msft Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct ,when Performing a List on 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/, ObjectMeta returned will contain the full path
ObjectMeta { location: Path { raw: "17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv" }, last_modified: 2023-08-10T18:31:52Z, size: 11, e_tag: Some("0x8DB99D01129EA9D") }

image

object_store/src/azure/mod.rs Outdated Show resolved Hide resolved
@tustvold
Copy link
Contributor

I took the liberty of pushing some minor changes to prepare this for merge

@tustvold tustvold changed the title fix: add support for microsoft onelake Add Support for Microsoft Fabric / OneLake Aug 11, 2023
@tustvold tustvold merged commit 230612e into apache:master Aug 11, 2023
13 checks passed
@vmuddassir-msft
Copy link
Contributor Author

@tustvold - When will these changes to the crate be published? I noticed that the last build was released on June 6th https://crates.io/crates/object_store/versions

@tustvold
Copy link
Contributor

I intend to cut a RC today or tomorrow, with a view to publishing a release later this week

@vmuddassir-msft
Copy link
Contributor Author

I intend to cut a RC today or tomorrow, with a view to publishing a release later this week

Thanks - I have another PR on delta-rs that needs to be verified with the latest changes to object store crate dependency referenced in delta-rs, This is required to fix delta-io/delta-rs#1418.

@vmuddassir-msft
Copy link
Contributor Author

@tustvold - Any update on the release for changes to object_store crate

@tustvold
Copy link
Contributor

The RC was cut on Tuesday, Apache mandates at least a 3 day voting period, and so the release should be published today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants