Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for Fabric OneLake storage #1809

Open
djouallah opened this issue Sep 23, 2023 · 8 comments
Open

add support for Fabric OneLake storage #1809

djouallah opened this issue Sep 23, 2023 · 8 comments
Assignees
Labels
feat 🎇 New feature or request support 🤝 User-driven support

Comments

@djouallah
Copy link

djouallah commented Sep 23, 2023

trying this code

import glaredb
import pandas as pd
df = pd.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
    }
)

con = glaredb.connect("/lakehouse/default/Files")
con.sql(f'''CREATE or replace table  xxx  AS SELECT * FROM df ''')
con.close()

I think you need the latest version of arrow-rs to make it works
apache/arrow-rs#4573

@djouallah djouallah added the feat 🎇 New feature or request label Sep 23, 2023
@scsmithr scsmithr self-assigned this Sep 23, 2023
@scsmithr
Copy link
Member

Did some digging on this, it's likely we'll support abfs://... paths before the lakehouse file api (/lakehouse/...). There's some challenges around some unimplemented file system operations with blobfuse.


Notes for impl:

  • We'll need to update object_store to explicitly close (drop) the file before calls to std::fs::rename, otherwise the metadata is not flushed in time for the rename. I believe this is actually a bug in blobfuse since the metadata should be flushed on file create, but isn't.
  • Blobfuse doesn't support hard linking, so copy_if_not_exists just fails. Not sure what to do here yet.

@greyscaled greyscaled added the support 🤝 User-driven support label Sep 26, 2023
@jordandakota
Copy link

As a vote or confidence, a onelake destination in glaredb would make me choose this over Fabric any day. Power BI is great, the concept of onelake to empower power BI is great. Fabric not so much.

@djouallah
Copy link
Author

That's fine, you don't need to like other Fabric Engines, OneLake is neutral and works with any Engine as long as it understand Delta table.

@jordandakota
Copy link

Exactly. Am currently working with databricks and having unity catalog in OneLake. Only remaining issue is how Unity writes a table name vs how OneLake prefers to see it.

@djouallah
Copy link
Author

any update on this, I presume it should be easy now as it is supported by delta_rs

@scsmithr
Copy link
Member

any update on this, I presume it should be easy now as it is supported by delta_rs

We've made some changes to how we plumb stuff through to delta-rs, but I have not tested if this all works yet with Fabric (either via abfs://... or through the filesystem api). We'll be checking on this over the next couple of days, and I'll follow up with an update.

@jordandakota
Copy link

Sounds great. Looking forward to it.

@djouallah
Copy link
Author

any update, I see that you are using now the latest version of Arrow rs,
basically we need something like this

write_deltalake("abfss://Delta_Table@onelake.dfs.fabric.microsoft.com/Delta_Table.Lakehouse/Tables/fruit",
df,storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat 🎇 New feature or request support 🤝 User-driven support
Projects
None yet
Development

No branches or pull requests

4 participants