Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for Azure secrets #478

Merged
merged 6 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ See our [official documentation][docs] for further details.
- `SELECT` queries executed by the DuckDB engine can directly read Postgres tables. (If you only query Postgres tables you need to run `SET duckdb.force_execution TO true`, see the **IMPORTANT** section above for details)
- Able to read [data types](https://www.postgresql.org/docs/current/datatype.html) that exist in both Postgres and DuckDB. The following data types are supported: numeric, character, binary, date/time, boolean, uuid, json, and arrays.
- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
- Read and Write support for object storage (AWS S3, Cloudflare R2, or Google GCS):
- Read and Write support for object storage (AWS S3, Azure, Cloudflare R2, or Google GCS):
- Read parquet, CSV and JSON files:
- `SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)`
- `SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)`
Expand Down Expand Up @@ -124,9 +124,9 @@ CREATE EXTENSION pg_duckdb;

See our [official documentation][docs] for more usage information.

pg_duckdb relies on DuckDB's vectorized execution engine to read and write data to object storage bucket (AWS S3, Cloudflare R2, or Google GCS) and/or MotherDuck. The follow two sections describe how to get started with these destinations.
pg_duckdb relies on DuckDB's vectorized execution engine to read and write data to object storage bucket (AWS S3, Azure, Cloudflare R2, or Google GCS) and/or MotherDuck. The follow two sections describe how to get started with these destinations.

### Object storage bucket (AWS S3, Cloudflare R2, or Google GCS)
### Object storage bucket (AWS S3, Azure, Cloudflare R2, or Google GCS)

Querying data stored in Parquet, CSV, JSON, Iceberg and Delta format can be done with `read_parquet`, `read_csv`, `read_json`, `iceberg_scan` and `delta_scan` respectively.

Expand Down Expand Up @@ -157,6 +157,20 @@ Querying data stored in Parquet, CSV, JSON, Iceberg and Delta format can be done
LIMIT 100;
```

Note, for Azure, you will need to first install the Azure extension:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call out somewhere that writes to Azure are not yet supported, linking to this issue: duckdb/duckdb-azure#44

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note below in the readme, let me know what you think.

```sql
SELECT duckdb.install_extension('azure');
```

You may then store a secret using the `connection_string` parameter as such:
```sql
INSERT INTO duckdb.secrets
(type, connection_string)
VALUES ('Azure', '<your connection string>');
```

Note: writes to Azure are not yet supported, [here][duckdb/duckdb_azure#44] is the current discussion for more information.

### Connect with MotherDuck

pg_duckdb also integrates with [MotherDuck][md]. To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file:
Expand Down
30 changes: 18 additions & 12 deletions include/pgduckdb/pgduckdb_options.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,24 @@
namespace pgduckdb {

/* constants for duckdb.secrets */
#define Natts_duckdb_secret 10
#define Anum_duckdb_secret_name 1
#define Anum_duckdb_secret_type 2
#define Anum_duckdb_secret_key_id 3
#define Anum_duckdb_secret_secret 4
#define Anum_duckdb_secret_region 5
#define Anum_duckdb_secret_session_token 6
#define Anum_duckdb_secret_endpoint 7
#define Anum_duckdb_secret_r2_account_id 8
#define Anum_duckdb_secret_use_ssl 9
#define Anum_duckdb_secret_scope 10
#define Natts_duckdb_secret 11
#define Anum_duckdb_secret_name 1
#define Anum_duckdb_secret_type 2
#define Anum_duckdb_secret_key_id 3
#define Anum_duckdb_secret_secret 4
#define Anum_duckdb_secret_region 5
#define Anum_duckdb_secret_session_token 6
#define Anum_duckdb_secret_endpoint 7
#define Anum_duckdb_secret_r2_account_id 8
#define Anum_duckdb_secret_use_ssl 9
#define Anum_duckdb_secret_scope 10
#define Anum_duckdb_secret_connection_string 11

enum SecretType { S3, R2, GCS, AZURE };

typedef struct DuckdbSecret {
std::string name;
std::string type;
SecretType type;
std::string key_id;
std::string secret;
std::string region;
Expand All @@ -29,8 +32,11 @@ typedef struct DuckdbSecret {
std::string r2_account_id;
bool use_ssl;
std::string scope;
std::string connection_string; // Used for Azure
} DuckdbSecret;

std::string SecretTypeToString(SecretType type);

extern std::vector<DuckdbSecret> ReadDuckdbSecrets();

/* constants for duckdb.extensions */
Expand Down
11 changes: 11 additions & 0 deletions sql/pg_duckdb--0.1.0--0.2.0.sql
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,14 @@ ALTER TABLE duckdb.secrets ADD COLUMN scope TEXT;

ALTER TABLE duckdb.tables ADD COLUMN default_database TEXT NOT NULL DEFAULT 'my_db';
ALTER TABLE duckdb.tables ALTER COLUMN default_database DROP DEFAULT;

-- Alter duckdb.secrets to allow column "key_id" & "secret" to be NULL
ALTER TABLE duckdb.secrets ALTER COLUMN key_id DROP NOT NULL;
ALTER TABLE duckdb.secrets ALTER COLUMN secret DROP NOT NULL;

-- Update "type_constraint" CHECK on "type" to allow "Azure"
ALTER TABLE duckdb.secrets DROP CONSTRAINT type_constraint;
ALTER TABLE duckdb.secrets ADD CONSTRAINT type_constraint CHECK (upper(type) IN ('S3', 'GCS', 'R2', 'AZURE'));

-- Add "azure_connection_string" column
ALTER TABLE duckdb.secrets ADD COLUMN connection_string TEXT;
62 changes: 37 additions & 25 deletions src/pgduckdb_duckdb.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -224,34 +224,46 @@ GetSeqLastValue(const char *seq_name) {
return PostgresFunctionGuard(DirectFunctionCall1Coll, pg_sequence_last_value, InvalidOid, table_seq_oid);
}

void
WriteSecretQueryForS3R2OrGCP(const DuckdbSecret &secret, std::ostringstream &query) {
bool is_r2_cloud_secret = secret.type == SecretType::R2;
query << "KEY_ID '" << secret.key_id << "', SECRET '" << secret.secret << "'";
if (secret.region.length() && !is_r2_cloud_secret) {
query << ", REGION '" << secret.region << "'";
}
if (secret.session_token.length() && !is_r2_cloud_secret) {
query << ", SESSION_TOKEN '" << secret.session_token << "'";
}
if (secret.endpoint.length() && !is_r2_cloud_secret) {
query << ", ENDPOINT '" << secret.endpoint << "'";
}
if (is_r2_cloud_secret) {
query << ", ACCOUNT_ID '" << secret.endpoint << "'";
}
if (!secret.use_ssl) {
query << ", USE_SSL 'FALSE'";
}
if (secret.scope.length()) {
query << ", SCOPE '" << secret.scope << "'";
}
}

void
DuckDBManager::LoadSecrets(duckdb::ClientContext &context) {
auto duckdb_secrets = ReadDuckdbSecrets();

int secret_id = 0;
for (auto &secret : duckdb_secrets) {
std::ostringstream query;
bool is_r2_cloud_secret = (secret.type.rfind("R2", 0) == 0);
query << "CREATE SECRET pgduckb_secret_" << secret_id << " ";
query << "(TYPE " << secret.type << ", KEY_ID '" << secret.key_id << "', SECRET '" << secret.secret << "'";
if (secret.region.length() && !is_r2_cloud_secret) {
query << ", REGION '" << secret.region << "'";
}
if (secret.session_token.length() && !is_r2_cloud_secret) {
query << ", SESSION_TOKEN '" << secret.session_token << "'";
}
if (secret.endpoint.length() && !is_r2_cloud_secret) {
query << ", ENDPOINT '" << secret.endpoint << "'";
}
if (is_r2_cloud_secret) {
query << ", ACCOUNT_ID '" << secret.endpoint << "'";
}
if (!secret.use_ssl) {
query << ", USE_SSL 'FALSE'";
}
if (secret.scope.length()) {
query << ", SCOPE '" << secret.scope << "'";
query << "(TYPE " << SecretTypeToString(secret.type) << ", ";

if (secret.type == SecretType::AZURE) {
query << "CONNECTION_STRING '" << secret.connection_string << "'";
} else {
WriteSecretQueryForS3R2OrGCP(secret, query);
}

query << ");";

DuckDBQueryOrThrow(context, query.str());
Expand Down Expand Up @@ -298,19 +310,19 @@ DuckDBManager::LoadExtensions(duckdb::ClientContext &context) {

void
DuckDBManager::RefreshConnectionState(duckdb::ClientContext &context) {
const auto extensions_table_last_seq = GetSeqLastValue("extensions_table_seq");
if (IsExtensionsSeqLessThan(extensions_table_last_seq)) {
LoadExtensions(context);
UpdateExtensionsSeq(extensions_table_last_seq);
}
Y-- marked this conversation as resolved.
Show resolved Hide resolved

const auto secret_table_last_seq = GetSeqLastValue("secrets_table_seq");
if (IsSecretSeqLessThan(secret_table_last_seq)) {
DropSecrets(context);
LoadSecrets(context);
UpdateSecretSeq(secret_table_last_seq);
}

const auto extensions_table_last_seq = GetSeqLastValue("extensions_table_seq");
if (IsExtensionsSeqLessThan(extensions_table_last_seq)) {
LoadExtensions(context);
UpdateExtensionsSeq(extensions_table_last_seq);
}

auto http_file_cache_set_dir_query =
duckdb::StringUtil::Format("SET http_file_cache_dir TO '%s';", CreateOrGetDirectoryPath("duckdb_cache"));
DuckDBQueryOrThrow(context, http_file_cache_set_dir_query);
Expand Down
58 changes: 55 additions & 3 deletions src/pgduckdb_options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,43 @@ DatumToString(Datum datum) {
return column_value;
}

bool
DoesSecretRequiresKeyIdOrSecret(const SecretType type) {
return type == SecretType::S3 || type == SecretType::GCS || type == SecretType::R2;
}

SecretType
StringToSecretType(const std::string &type) {
auto uc_type = duckdb::StringUtil::Upper(type);
if (uc_type == "S3") {
return SecretType::S3;
} else if (uc_type == "R2") {
return SecretType::R2;
} else if (uc_type == "GCS") {
return SecretType::GCS;
} else if (uc_type == "AZURE") {
return SecretType::AZURE;
} else {
throw std::runtime_error("Invalid secret type: '" + type + "'");
}
}

std::string
SecretTypeToString(SecretType type) {
switch (type) {
case SecretType::S3:
return "S3";
case SecretType::R2:
return "R2";
case SecretType::GCS:
return "GCS";
case SecretType::AZURE:
return "AZURE";
default:
throw std::runtime_error("Invalid secret type: '" + std::to_string(type) + "'");
}
}

std::vector<DuckdbSecret>
ReadDuckdbSecrets() {
HeapTuple tuple = NULL;
Expand All @@ -70,9 +107,21 @@ ReadDuckdbSecrets() {
heap_deform_tuple(tuple, RelationGetDescr(duckdb_secret_relation), datum_array, is_null_array);
DuckdbSecret secret;

secret.type = DatumToString(datum_array[Anum_duckdb_secret_type - 1]);
secret.key_id = DatumToString(datum_array[Anum_duckdb_secret_key_id - 1]);
secret.secret = DatumToString(datum_array[Anum_duckdb_secret_secret - 1]);
auto type_str = DatumToString(datum_array[Anum_duckdb_secret_type - 1]);
secret.type = StringToSecretType(type_str);
if (!is_null_array[Anum_duckdb_secret_key_id - 1])
secret.key_id = DatumToString(datum_array[Anum_duckdb_secret_key_id - 1]);
else if (DoesSecretRequiresKeyIdOrSecret(secret.type)) {
elog(WARNING, "Invalid '%s' secret: key id is required.", type_str.c_str());
continue;
}

if (!is_null_array[Anum_duckdb_secret_secret - 1])
secret.secret = DatumToString(datum_array[Anum_duckdb_secret_secret - 1]);
else if (DoesSecretRequiresKeyIdOrSecret(secret.type)) {
elog(WARNING, "Invalid '%s' secret: secret is required.", type_str.c_str());
continue;
}

if (!is_null_array[Anum_duckdb_secret_region - 1])
secret.region = DatumToString(datum_array[Anum_duckdb_secret_region - 1]);
Expand All @@ -94,6 +143,9 @@ ReadDuckdbSecrets() {
if (!is_null_array[Anum_duckdb_secret_scope - 1])
secret.scope = DatumToString(datum_array[Anum_duckdb_secret_scope - 1]);

if (!is_null_array[Anum_duckdb_secret_connection_string - 1])
secret.connection_string = DatumToString(datum_array[Anum_duckdb_secret_connection_string - 1]);

duckdb_secrets.push_back(secret);
}

Expand Down