serverless-duckdb

An example of how to run DuckDB on AWS Lambda & API Gateway. This will deploy two Lambda functions:

An API Gateway endpoint to which DuckDB queries can be issued via a POST request, which is authenticated by an API Key
A Function URL Lambda that supports streaming the query results as an Apache Arrow IPC stream, which uses NO authentication by default (you can add AWS_IAM auth manually if you wish)

Requirements

You'll need a current v3 version installation of the Serverless Framework on the machine you're planning to deploy the application from.

Also, you'll have to setup your AWS credentials according to the Serverless docs.

Configuration

DuckDB is automatically configured to use the HTTPFS extension, and uses the AWS credentials that are given to your Lambda function by its execution role. This means you can potentially query data that is available via HTTP(S) or in AWS S3 buckets.

If you want to also query data (e.g. Parquet files) that resides in one or more S3 buckets, you'll have to adjust the iamRoleStatements part of the function configuration in the serverless.yml file. Just replace the YOUR-S3-BUCKET-NAME with your actual S3 bucket name.

Deployment

After you cloned this repository to your local machine and cd'ed in its directory, the application can be deployed like this (don't forget a npm i to install the dependencies!):

$ sls deploy

This will deploy the stack to the default AWS region us-east-1. In case you want to deploy the stack to a different region, you can specify a --region argument:

$ sls deploy --region eu-central-1

The deployment should take 2-3 minutes. Once the deployment is finished, you should find some output in your console that indicates the API Gateway endpoint URL and the API Key:

api keys:
  DuckDBKey: REDACTED
endpoints:
  POST - https://REDACTED.execute-api.us-east-1.amazonaws.com/prd/v1/query
  streamingQuery: https://REDACTED.lambda-url.us-east-1.on.aws/

Usage

API Gateway endpoint

You can now query your DuckDB endpoint via HTTP requests (don't forget to exchange REDACTED with your real URL and API Key), e.g.

curl -L -XPOST 'https://REDACTED.execute-api.us-east-1.amazonaws.com/prd/v1/query' \
  --header 'x-api-key: REDACTED' \
  --header 'Content-Type: application/json' \
  --data-raw '{
      "query": "SELECT avg(c_acctbal) FROM '\''https://shell.duckdb.org/data/tpch/0_01/parquet/customer.parquet'\'';"
  }'

Function URL Lambda

You can query the streaming Lambda by issueing the following command (don't forget to specify an --output path, this is where the Apache Arrow file will be stored):

curl -L -XPOST 'https://REDACTED.lambda-url.us-east-1.on.aws/' \
  --header 'Content-Type: application/json' \
  --data-raw 'SELECT 1' \
  --output /tmp/result.arrow

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
resources		resources
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
serverless.yml		serverless.yml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

serverless-duckdb

Requirements

Configuration

Deployment

Usage

API Gateway endpoint

Function URL Lambda

About

Releases

Packages

Languages

License

tobilg/serverless-duckdb

Folders and files

Latest commit

History

Repository files navigation

serverless-duckdb

Requirements

Configuration

Deployment

Usage

API Gateway endpoint

Function URL Lambda

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages