diff --git a/docs/domain/timeseries/generate/node.rst b/docs/domain/timeseries/generate/node.rst index dab0eba7..28a43737 100644 --- a/docs/domain/timeseries/generate/node.rst +++ b/docs/domain/timeseries/generate/node.rst @@ -346,7 +346,7 @@ will open up a map view showing the current position of the ISS: .. _detailed guide: https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Promises .. _ground point: https://en.wikipedia.org/wiki/Ground_track .. _input values: https://node-postgres.com/features/queries#Parameterized%20query -.. _interactive REPL mode: https://www.oreilly.com/library/view/learning-node-2nd/9781491943113/ch04.html +.. _interactive REPL mode: https://web.archive.org/web/20240910181004/https://www.oreilly.com/library/view/learning-node-2nd/9781491943113/ch04.html .. _International Space Station: https://www.nasa.gov/mission_pages/station/main/index.html .. _node-postgres: https://www.npmjs.com/package/pg .. _Node.js: https://nodejs.org/en/ diff --git a/docs/integrate/dbt/index.md b/docs/integrate/dbt/index.md index 0761da74..fe6df6c9 100644 --- a/docs/integrate/dbt/index.md +++ b/docs/integrate/dbt/index.md @@ -1,14 +1,18 @@ (dbt)= - # dbt +:::{include} /_include/links.md +::: + +## About ```{div} :style: "float: right" [![](https://www.getdbt.com/ui/img/logos/dbt-logo.svg){w=180px}](https://www.getdbt.com/) ``` -[dbt] is an open source tool for transforming data in data warehouses using Python and -SQL. It is an SQL-first transformation workflow platform that lets teams quickly and +[dbt] is a tool for transforming data in data warehouses using Python and SQL. + +It is an SQL-first transformation workflow platform that lets teams quickly and collaboratively deploy analytics code following software engineering best practices like modularity, portability, CI/CD, and documentation. @@ -56,69 +60,101 @@ scale. ::: -## Install +### dbt's Features +The data abstraction layer provided by [dbt-core] allows the decoupling of +the models on which reports and dashboards rely from the source data. When +business rules or source systems change, you can still maintain the same models +as a stable interface. + +Some of the things that dbt can do include: + +* Import reference data from CSV files. +* Track changes in source data with different strategies so that downstream + models do not need to be built every time from scratch. +* Run tests on data, to confirm assumptions remain valid, and to validate + any changes made to the models' logic. + +### CrateDB's Benefits +Due to its unique capabilities, CrateDB is an excellent warehouse choice for +data transformation projects. It offers automatic indexing, fast aggregations, +easy partitioning, and the ability to scale horizontally. + + +## Setup Install the most recent version of the [dbt-cratedb2] Python package. ```shell pip install --upgrade 'dbt-cratedb2' ``` -## Connect -**dbt Profile Configuration:** CrateDB targets should be set up using the -following configuration in your `profiles.yml` file. +## Configure +Because CrateDB is compatible with PostgreSQL, the same connectivity +options apply like outlined on the [dbt Postgres Setup] documentation +page. + +The dbt connection profile settings for CrateDB stored in [`profiles.yml`] +are identical with PostgreSQL. ```yaml -company-name: +cratedb_analytics: target: dev outputs: dev: type: cratedb - host: [hostname] + host: [clustername].aks1.westeurope.azure.cratedb.net + port: 5432 user: [username] - password: [password] - port: [port] # Default is 5432. - dbname: crate # Fixed. Do not change. - schema: doc # `doc` is the default schema. + pass: [password] + dbname: crate # CrateDB's only catalog is `crate`. + schema: doc # Define schema. `doc` is the default. + search_path: doc # Use the same value like `schema` by default. ``` -dbt-cratedb2 is based on dbt-postgres, which uses [psycopg2] to connect to -the database server. -Because CrateDB is compatible with PostgreSQL, the same connectivity -options apply like outlined on the [dbt Postgres Setup] documentation -page. -## Usage +## Learn -### Custom Schemas -By default, dbt writes the models into the schema you configured in your -profile, but in some dbt projects you may need to write data into different -target schemas. You can adjust the target schema using [custom schemas with -dbt]. +Learn how to use CrateDB with dbt by exploring concise examples. -If your dbt project has a custom macro called `generate_schema_name`, dbt -will use it instead of the default macro. This allows you to customize -the name generation according to your needs. +:::{rubric} Tutorials +::: -```jinja -{% macro generate_schema_name(custom_schema_name, node) -%} - {%- set default_schema = target.schema -%} - {%- if custom_schema_name is none -%} - {{ default_schema }} - {%- else -%} - {{ custom_schema_name | trim }} - {%- endif -%} -{%- endmacro %} +::::{grid} 2 +:gutter: 5 + +:::{grid-item-card} +:link: dbt-usage +:link-type: ref +:link-alt: dbt usage guidelines +:padding: 3 +:class-card: sd-text-center sd-pt-4 +:class-header: sd-fs-4 +{material-outlined}`integration_instructions;2.5em` +Usage Guidelines +^^^ +```{toctree} +:maxdepth: 2 +:hidden: + +usage ``` - - -## Learn - -:::{rubric} Tutorials ++++ +Usage guidelines, notes, and advanced configuration options. ::: -- [Using dbt with CrateDB] -:::{rubric} Development +:::{grid-item-card} +:link: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/ +:link-type: url +:link-alt: dbt CrateDB Examples +:padding: 3 +:class-card: sd-text-center sd-pt-4 +:class-header: sd-fs-4 +{material-outlined}`apps;2.5em` +Example Projects +^^^ ++++ +Explore a few dbt example projects using CrateDB. ::: -- [dbt CrateDB examples] + +:::: :::{rubric} Webinars @@ -150,12 +186,9 @@ and then publish your project to a GitHub repository. :::: - -[custom schemas with dbt]: https://docs.getdbt.com/docs/build/custom-schemas [dbt]: https://www.getdbt.com/ +[dbt-core]: https://github.com/dbt-labs/dbt-core [dbt-cratedb2]: https://pypi.org/project/dbt-cratedb2/ [dbt Cloud]: https://www.getdbt.com/product/dbt-cloud/ [dbt Postgres Setup]: https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup -[Using dbt with CrateDB]: https://community.cratedb.com/t/using-dbt-with-cratedb/1566 -[dbt CrateDB examples]: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/ -[psycopg2]: https://pypi.org/project/psycopg2/ +[`profiles.yml`]: https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml diff --git a/docs/integrate/dbt/usage.md b/docs/integrate/dbt/usage.md new file mode 100644 index 00000000..274554f2 --- /dev/null +++ b/docs/integrate/dbt/usage.md @@ -0,0 +1,172 @@ +(dbt-usage)= +# Using dbt with CrateDB + +:::{include} /_include/links.md +::: + +_Setup instructions and guidelines for transforming data using dbt and CrateDB._ + +:::{div} +For running the following steps, you will need connectivity to a CrateDB +cluster, and a Python installation on your workstation. You can use +[CrateDB Self-Managed] or [CrateDB Cloud]. +::: + +## Setup + +To start a CrateDB instance for evaluation purposes, use Docker or Podman. +```shell +docker run --rm \ + --publish=4200:4200 --publish=5432:5432 \ + --env=CRATE_HEAP_SIZE=2g crate:latest +``` + +Install the most recent version of the [dbt-cratedb2] Python package. +```shell +pip install --upgrade 'dbt-cratedb2' +``` +:::{note} +dbt-cratedb2 is based on dbt-postgres, which uses [psycopg2] to connect to +the database server. +::: + +## Configure +A minimal set of **dbt profile configuration** options, for example within a +[`profiles.yml`] file at `~/.dbt/profiles.yml`. +```bash +cd ~ +mkdir -p .dbt +cat << EOF > .dbt/profiles.yml +cratedb_analytics: + target: dev + outputs: + dev: + type: cratedb + host: localhost + port: 5432 + user: crate + pass: crate + dbname: crate + schema: doc + search_path: doc +EOF +``` +Please note the values for `dbname`, `schema`, and `search_path` in this example. + +## Project +When working with dbt, you are working on behalf of a dbt project. +A dbt project has a [specific structure][dbt-project-structure], and contains a +combination of SQL, Jinja, YAML, and Markdown files. +In your project folder, alongside the `models` folder that most projects have, +a folder called `macros` can include macro override files. + +At [cratedb-examples » framework/dbt], you can explore a few ready-to-run dbt +projects that demonstrate usage with CrateDB. + +## Appendix + +A few notes about advanced configuration options and general usage +information. + +### Search Path +The `search_path` config controls the CrateDB "search path" that dbt configures +when opening new connections to the database. By default, the CrateDB search +path is `"doc"`, meaning that unqualified names will be +searched for in the `doc` schema. + +### Custom Schemas +By default, dbt writes the models into the schema you configured in your +profile, but in some dbt projects you may need to write data into different +target schemas. You can adjust the target schema using [custom schemas with +dbt]. + +If your dbt project has a custom macro called `generate_schema_name`, dbt +will use it instead of the default macro. This allows you to customize +the name generation according to your needs. + +```jinja +{% macro generate_schema_name(custom_schema_name, node) -%} + {%- set default_schema = target.schema -%} + {%- if custom_schema_name is none -%} + {{ default_schema }} + {%- else -%} + {{ custom_schema_name | trim }} + {%- endif -%} +{%- endmacro %} +``` + +### Full Connection Options +CrateDB targets should be set up using the following **dbt profile configuration** in +your [`profiles.yml`] file, which is identical to the [setup options of dbt-postgres]. +```yaml +cratedb_analytics: + target: dev + outputs: + dev: + type: cratedb + host: [clustername].aks1.westeurope.azure.cratedb.net + user: [username] + password: [password] + port: 5432 + dbname: crate # CrateDB's only catalog is `crate`. + schema: doc # You can define any schema. `doc` is the default. + threads: [optional, 1 or more] + [keepalives_idle]: 0 # default 0, indicating the system default. + connect_timeout: 10 # default 10 seconds + [retries]: 1 # default 1 retry on error/timeout when opening connections + [search_path]: # optional, override the default postgres `search_path` + [role]: # optional, set the role dbt assumes when executing queries + [sslmode]: # optional, set the `sslmode` used to connect to the database + [sslcert]: # optional, set the `sslcert` to control the certificate file location + [sslkey]: # optional, set the `sslkey` to control the location of the private key + [sslrootcert]: # optional, set the `sslrootcert` config value to a new file path + # in order to customize the file location that contain root certificates +``` + + +## Notes + +### CrateDB's Differences +- CrateDB’s fixed catalog name is `crate`, the default schema name is `doc`. +- CrateDB does not implement the notion of a database, however tables can be created in different [schemas](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/create-table.html#ddl-create-table-schemas). +- When asked for a database name, specifying a schema name (any), or the fixed catalog name `crate` may be applicable. +- If a database/schema name is omitted while connecting, the PostgreSQL drivers may default to the “username”. +- The predefined [superuser](https://cratedb.com/docs/crate/reference/en/latest/admin/user-management.html#administration-user-management) on an unconfigured CrateDB cluster is called `crate`, defined without a password. +- For authenticating properly, please learn about the available [authentication](https://cratedb.com/docs/crate/reference/en/latest/admin/auth/index.html#admin-auth) options. + +### Feature Coverage +Those dbt features have been tested successfully with CrateDB. + +* [Model materializations](https://docs.getdbt.com/docs/build/materializations): + table, view, incremental, ephemeral +* [Incremental models](https://docs.getdbt.com/docs/build/incremental-models-overview) +* [Source data freshness](https://docs.getdbt.com/docs/build/sources#source-data-freshness) +* [CSV seeds](https://docs.getdbt.com/docs/build/seeds) +* [Data tests](https://docs.getdbt.com/docs/build/tests) + +### Caveats +- Model materializations using the "materialized view" strategy are + not supported yet. +- Incremental materializations with CrateDB currently only support the + `delete+insert` strategy. +- Incremental materializations do not support columns using the + {ref}`OBJECT ` data type yet. + + +:::{note} +CrateDB is continuously adding new features and we will endeavor to come +back and update this article if there are any updates or improvements. +We are tracking interoperability issues per [Tool: dbt], and appreciate +any contributions and reports. +::: + + +[cratedb-examples » framework/dbt]: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/ +[custom schemas with dbt]: https://docs.getdbt.com/docs/build/custom-schemas +[dbt]: https://www.getdbt.com/ +[dbt-cratedb2]: https://pypi.org/project/dbt-cratedb2/ +[dbt-project-structure]: https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview +[`profiles.yml`]: https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml +[psycopg2]: https://pypi.org/project/psycopg2/ +[setup options of dbt-postgres]: https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup +[Tool: dbt]: https://github.com/crate/crate/labels/tool%3A%20dbt