Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg Database does not respect s3.endpoint setting of catalog config #74558

Closed
Gerrit-K opened this issue Jan 14, 2025 · 5 comments · Fixed by #75375
Closed

Iceberg Database does not respect s3.endpoint setting of catalog config #74558

Gerrit-K opened this issue Jan 14, 2025 · 5 comments · Fixed by #75375
Assignees

Comments

@Gerrit-K
Copy link

Gerrit-K commented Jan 14, 2025

Describe the unexpected behaviour
When using an Iceberg database/table with vended_credentials, the config setting s3.endpoint from the REST catalog is not used. Instead, the request is sent to AWS (unless a storage_endpoint is explicitly set in the database engine settings).

How to reproduce

  • Set up an Iceberg REST catalog implementation and a non-AWS S3-compatible storage backend (e.g. Polaris with this enhancement)
  • Create a catalog using these settings:
    {
      "type": "INTERNAL",
      "name": "test_s3_catalog",
      "properties": {
        "default-base-location": "s3://${BUCKET_NAME}"
      },
      "storageConfigInfo": {
        "storageType": "S3_COMPATIBLE",
        "s3.endpoint": "http://${S3_ENDPOINT}",  // <-- this is the important bit, it should point to e.g. minio
        "s3.credentials.catalog.accessKeyId": "ACCESS_KEY_ID",
        "s3.credentials.catalog.secretAccessKey": "SECRET_ACCESS_KEY",
        "s3.pathStyleAccess": true,
        "skipCredentialSubscopingIndirection": true,
        "s3.credentials.client.accessKeyId": "",
        "s3.credentials.client.secretAccessKey": "",
        "s3.region": "irrelevant",
        "s3.roleArn": null,
        "allowedLocations": [
          "s3://${BUCKET_NAME}"
        ]
      }
    }
  • Create a principal and set all necessary roles and permissions for that principal on the catalog
  • Upload data into a new Iceberg table in that catalog (e.g. using pyspark, see here)
  • Start clickhouse/clickhouse-server:head with docker
  • Create an Iceberg database:
    CREATE DATABASE test
    ENGINE = Iceberg('http://<CATALOG_HOSTNAME>/api/catalog')
    SETTINGS catalog_type = 'rest', warehouse = 'test_s3_catalog', catalog_credential = '<principal_client_id>:<principal_client_secret>'
  • Try to query the uploaded table, e.g. via SELECT count() FROM test.<table.name>

Expected behavior
The underlying request should fire against the s3.endpoint configured in the catalog and successfully return data.

Error message and/or stacktrace
This error is returned:

Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: The AWS Access Key Id you provided does not exist in our records. (Code: 23, S3 exception: 'InvalidAccessKeyId'). (S3_ERROR)

This indicates that the request is fired against the default AWS S3 endpoint (ending up in an authorization error) instead of the configured one.

Additional context
I would expect the logic to be somewhere around here, where the response from the api/catalog/v1/test_s3_catalog/namespaces/<namespace>/tables/<table> is processed. This response does include the config object (if header X-Iceberg-Access-Delegation:vended-credentials is set) and that object includes the s3.endpoint setting, but it's not used by ClickHouse.

When adding storage_endpoint = 'http://<S3_ENDPOINT>/<BUCKET_NAME>' to the database engine settings, the request works, but this would contradict one of the key advantages/features of a REST catalog: abstracting away the storage implementation.

@kssenii kssenii self-assigned this Jan 14, 2025
@kssenii
Copy link
Member

kssenii commented Jan 30, 2025

@Gerrit-K

This response does include the config object (if header X-Iceberg-Access-Delegation:vended-credentials is set) and that object includes the s3.endpoint setting, but it's not used by ClickHouse.

with that unmerged implementation, do you also have "null" for credentials, but non-null endpoint parameter? E.g. I got

"config":{"s3.access-key-id":null,"s3.secret-access-key":null,"s3.endpoint":"http://<host>:9000"}

@Gerrit-K
Copy link
Author

Gerrit-K commented Jan 31, 2025

@kssenii Hm strange, no, I get the credentials. They need to be set as environment variables, though. E.g. the catalog (from /api/management/v1/catalogs/test_s3_catalog) looks like this:

{
    "name": "test_s3_catalog",
    // ...
    "storageConfigInfo": {
        "storageType": "S3_COMPATIBLE",
        "s3.endpoint": "https://my.s3.endpoint",
        "s3.credentials.catalog.accessKeyId": "ACCESS_KEY_ID", // this needs to match the name(!) of the environment variable holding the actual value
        "s3.credentials.catalog.secretAccessKey": "SECRET_ACCESS_KEY",
        // ...
    }
}

and the polaris pod looks like this (I actually have secret refs there, but it's simplified for illustration):

spec:
  containers:
  - name: polaris
    env:
    - name: AWS_REGION
      value: irrelevant
    - name: ACCESS_KEY_ID
      value: foo
    - name: SECRET_ACCESS_KEY
      value: bar

So when I query the catalog API (from /api/catalog/v1/test_s3_catalog/namespaces/<namespace>/tables/<table>) I get:

{
    // ...
    "config": {
        "s3.path-style-access": "true",
        "s3.access-key-id": "foo",
        "s3.secret-access-key": "bar",
        "s3.endpoint": "https://my.s3.endpoint",
        "client.region": "irrelevant"
    }
}

@kssenii
Copy link
Member

kssenii commented Jan 31, 2025

The culprit was "skipCredentialSubscopingIndirection": true (though I took it from this issue description). Without it creds are shown.

@Gerrit-K
Copy link
Author

Interesting 🤔 For me it works with that setting, as I need it because the ceph cluster I connect to doesn't have STS configured. Skipping the indirection makes Polaris return the static credentials instead of generating an STS token on demand. Tbh, I would not have expected this to have an impact on the config value of the endpoint. But glad you were able to reproduce it!

@kssenii
Copy link
Member

kssenii commented Jan 31, 2025

Support for s3.endpoint is implemented in #75375. Will backport once merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants