Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing S3 Express one zone bucket from pyiceberg #928

Open
munip opened this issue Jul 14, 2024 · 5 comments
Open

Accessing S3 Express one zone bucket from pyiceberg #928

munip opened this issue Jul 14, 2024 · 5 comments
Labels

Comments

@munip
Copy link

munip commented Jul 14, 2024

Question

I have been able to access a S3 bucket with pyIceberg using SqlCatalog successfully with
catalog = SqlCatalog(
"default",
**{
"uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
"warehouse": "s3://myicebergbkt/test",
"s3.access-key-id": "myid",
"s3.secret-access-key": "mykey",
"s3.session-token":"my-token"
"s3.region": "us-east-1"
},
)
But, when I try accessing the same with S3 express one bucket, I am stuck on the syntax. Tried all options with no luck:
catalog = SqlCatalog(
"default",
**{
"uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
"warehouse": "s3://us-east-1:730335207565:bucket/pyicebkt--use1-az4--x-s3/test", # I have also tried 730335207565:bucket/pyicebkt--use1-az4--x-s3 and just pyicebkt--use1-az4--x-s3 with no lcuk
"s3.access-key-id": "myid",
"s3.secret-access-key": "mykey",
"s3.session-token":"my-token"
"s3.region": "us-east-1"
},
)

I get the error : " Expected an S3 object path of the form 'bucket/key...', got a URI: "
Is S3 express one zone supported? If so, what is the syntax for warehouse variable?

@kevinjqliu
Copy link
Contributor

I think the error might be coming from the underlying pyarrow.fs.S3FileSystem class which is used to interact with s3
https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html
Not sure if this currently supports S3 Express One Zone right now.

According to this thread, PyArrow does not currently support it
lancedb/lancedb#1206

@muniatl
Copy link

muniatl commented Jul 15, 2024

Thanks kevinjqliu. From the other thread it doesn't look like pyarrow supports S3 express one. Does anyone know timelines for Express One Zone support?

@Fokko
Copy link
Contributor

Fokko commented Jul 15, 2024

@muniatl The best place to reach out would be the Arrow mailing list: https://lists.apache.org/list.html?dev@arrow.apache.org

@kevinjqliu
Copy link
Contributor

Arrow mailing list would be a good place to start.

PyIceberg depends on pyarrow to support s3 express one zone. I've found apache/arrow-rs#5140 which adds support for the arrow rust library.
It'll be great to open an issue with pyarrow to track support for s3 express one zone.

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants