Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta scan error within docker container #120

Open
steviedas opened this issue Nov 13, 2024 · 2 comments
Open

Delta scan error within docker container #120

steviedas opened this issue Nov 13, 2024 · 2 comments

Comments

@steviedas
Copy link

What am I trying to do?

We want to serialise a delta table to a json-like format to be returned to a from a fastapi endpoint in Python.

What happens?

I'm getting the following error, with specifics redacted:

duckdb.duckdb.IOException: IO Error: AzureBlobStorageFileSystem could not open file: 'az://{container_name}/{table_name}/part-00000-b20a9780-e199-425c-ad19-75fe2d2b1b0c-c000.snappy.parquet', unknown error occurred, this could mean the credentials used were wrong. Original error message: 'Fail to get a new connection for: https://{storage_account_name}.blob.core.windows.net./ Problem with the SSL CA cert (path? access rights?)'

It seems as if the connection to the storage account is made successfully, as it is able to locate one of the underlying parquet files within the delta directory. This code works outside of a Docker environment, copying the function and simply calling it returns the desired result.

To reproduce

  1. Create a fastapi app in docker clone this repo: https://github.com/steviedas/simple-fast-api.git
  2. Replace the placeholders for blob_connection_string, container_name and delta_table_name in the main.py (these can be acquired from your Azure instance).
  3. Run docker compose up --build
  4. Navigate to http://localhost:8000/docs
  5. Expand the "/events" endpoint, click on "Try it out" and hit "Execute"
  6. This will return a 500 error code and the trace will be in your terminal

Details:

  • Name: Steven Das
  • OS: Windows 11 x64 11th Gen Intel(R) Core(TM) i7-11800H
  • DuckDB Version: 1.1.3
  • Affiliation: Personal
  • I have also included a very small delta table within the repo and the relevant spark code to create it if you so wish.
  • Did you include all the code required to reproduce the issue? I think so, but if not, please contact me.
@steviedas
Copy link
Author

++ We have tested running the container on an Apple MacBook Pro (M2) and the endpoint works as expected. I'm not too sure what this could possibly be at this point. Any help would be greatly appreciated.

@samansmink
Copy link
Collaborator

try setting set azure_transport_option_type = 'curl'; this can solve certificate issues on azure. (see https://duckdb.org/docs/extensions/azure.html#configuration)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants