dbt: Improve entry point page. Absorb community tutorial.

crate · Nov 29, 2024 · c35cfb8 · c35cfb8
1 parent 7aac0f4
commit c35cfb8
Show file tree

Hide file tree

Showing 2 changed files with 198 additions and 46 deletions.
diff --git a/docs/integrate/dbt/index.md b/docs/integrate/dbt/index.md
@@ -7,8 +7,9 @@
 [![](https://www.getdbt.com/ui/img/logos/dbt-logo.svg){w=180px}](https://www.getdbt.com/)
 ```
 
-[dbt] is an open source tool for transforming data in data warehouses using Python and
-SQL. It is an SQL-first transformation workflow platform that lets teams quickly and
+[dbt] is a tool for transforming data in data warehouses using Python and SQL.
+
+It is an SQL-first transformation workflow platform that lets teams quickly and
 collaboratively deploy analytics code following software engineering best practices
 like modularity, portability, CI/CD, and documentation.
 
@@ -56,69 +57,60 @@ scale.
 :::
 
 
-## Install
+## Setup
 Install the most recent version of the [dbt-cratedb2] Python package.
 ```shell
 pip install --upgrade 'dbt-cratedb2'
 ```
+dbt-cratedb2 is based on dbt-postgres, which uses [psycopg2] to connect to
+the database server.
 
 
-## Connect
-**dbt Profile Configuration:** CrateDB targets should be set up using the
-following configuration in your `profiles.yml` file.
+## Configure
+Because CrateDB is compatible with PostgreSQL, the same connectivity
+options apply like outlined on the [dbt Postgres Setup] documentation
+page.
+
+The dbt connection profile settings for CrateDB stored in [`profiles.yml`]
+are identical with PostgreSQL.
 ```yaml
-company-name:
+cratedb_analytics:
   target: dev
   outputs:
     dev:
       type: cratedb
-      host: [hostname]
+      host: [clustername].aks1.westeurope.azure.cratedb.net
+      port: 5432
       user: [username]
-      password: [password]
-      port: [port]   # Default is 5432.
-      dbname: crate  # Fixed. Do not change.
-      schema: doc    # `doc` is the default schema.
-```
-dbt-cratedb2 is based on dbt-postgres, which uses [psycopg2] to connect to
-the database server.
-Because CrateDB is compatible with PostgreSQL, the same connectivity
-options apply like outlined on the [dbt Postgres Setup] documentation
-page.
-
-
-## Usage
-
-### Custom Schemas
-By default, dbt writes the models into the schema you configured in your
-profile, but in some dbt projects you may need to write data into different
-target schemas. You can adjust the target schema using [custom schemas with
-dbt].
-
-If your dbt project has a custom macro called `generate_schema_name`, dbt
-will use it instead of the default macro. This allows you to customize
-the name generation according to your needs.
-
-```jinja
-{% macro generate_schema_name(custom_schema_name, node) -%}
-  {%- set default_schema = target.schema -%}
-  {%- if custom_schema_name is none -%}
-    {{ default_schema }}
-  {%- else -%}
-    {{ custom_schema_name | trim }}
-  {%- endif -%}
-{%- endmacro %}
+      pass: [password]
+      dbname: crate     # CrateDB's only catalog is `crate`.
+      schema: doc       # Define schema. `doc` is the default.
+      search_path: doc  # Use the same value like `schema` by default.
 ```
 
 
 ## Learn
 
 :::{rubric} Tutorials
 :::
-- [Using dbt with CrateDB]
 
-:::{rubric} Development
-:::
-- [dbt CrateDB examples]
+:::::{grid}
+::::{grid-item-card}
+:link: dbt-usage
+:link-type: ref
+Advanced configuration options and other usage guidelines.
+```{toctree}
+:maxdepth: 2
+
+usage
+```
+::::
+::::{grid-item-card}
+:link: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/
+:link-type: url
+A few dbt example projects using CrateDB.
+::::
+:::::
 
 
 :::{rubric} Webinars
@@ -157,5 +149,5 @@ and then publish your project to a GitHub repository.
 [dbt Cloud]: https://www.getdbt.com/product/dbt-cloud/
 [dbt Postgres Setup]: https://docs.getdbt.com/docs/core/connect-data-platform/postgres-setup
 [Using dbt with CrateDB]: https://community.cratedb.com/t/using-dbt-with-cratedb/1566
-[dbt CrateDB examples]: https://github.com/crate/cratedb-examples/tree/main/framework/dbt/
 [psycopg2]: https://pypi.org/project/psycopg2/
+[`profiles.yml`]: https://docs.getdbt.com/docs/core/connect-data-platform/profiles.yml
diff --git a/docs/integrate/dbt/usage.md b/docs/integrate/dbt/usage.md
@@ -0,0 +1,160 @@
+(dbt-usage)=
+
+# Using dbt with CrateDB
+
+_Guidelines for transforming data using dbt and CrateDB._
+
+## Introduction
+
+### dbt's Features
+The data abstraction layer provided by [dbt][dbt-core] allows the decoupling of
+the models on which reports and dashboards rely from the source data. When
+business rules or source systems change, you can still maintain the same models
+as a stable interface.
+
+Some of the things that dbt can do include:
+
+* Import reference data from CSV files
+* Track changes in source data with different strategies so that downstream
+  models do not need to be built every time from scratch.
+* Run tests on data, to confirm assumptions remain valid, and to validate
+  any changes made to the models' logic.
+
+### CrateDB's Benefits
+Due to its unique capabilities, CrateDB is an excellent warehouse choice for
+data transformation projects. It offers automatic indexing, fast aggregations,
+easy partitioning, and the ability to scale horizontally.
+
+
+## Setup
+
+For running the following steps, you will need connectivity to a CrateDB
+cluster, and a Python installation on your workstation. The starting point
+will be a fresh installation of `dbt-cratedb2`.
+
+```bash
+pip install --upgrade 'dbt-cratedb2'
+```
+
+To start a CrateDB instance for evaluation purposes, use Docker or Podman.
+```shell
+docker run --rm \
+  --publish=4200:4200 --publish=5432:5432 \
+  --env=CRATE_HEAP_SIZE=2g crate:latest
+```
+
+**dbt Profile Configuration:** CrateDB targets should be set up using the
+following configuration in your connection profile, e.g. within a
+[`profiles.yml`] file at `~/.dbt/profiles.yml`.
+
+Now, create a connection profile `profiles.yaml` file including your
+connection details, for example at `~/.dbt/profiles.yml`.
+```bash
+cd ~
+mkdir -p .dbt
+cat << EOF > .dbt/profiles.yml
+cratedb_analytics:
+  target: dev
+  outputs:
+    dev:
+      type: cratedb
+      host: localhost
+      port: 5432
+      user: crate
+      pass: crate
+      dbname: crate
+      schema: doc
+      search_path: doc
+EOF
+```
+(please note the values for `database`, `schema`, and `search_path` in this example)
+
+A dbt project has a [specific structure][dbt-project-structure], and contains a combination of SQL, Jinja, YAML, and Markdown files.
+In your project folder, alongside the `models` folder that most projects have,
+a folder called `macros` can include macro override files.
+
+
+Those dbt features have been tested successfully:
+
+* models with [view, table, and ephemeral materializations](https://docs.getdbt.com/docs/build/materializations)
+* [dbt source freshness](https://docs.getdbt.com/docs/deploy/source-freshness)
+* [dbt test](https://docs.getdbt.com/docs/build/tests)
+* [dbt seed](https://docs.getdbt.com/docs/build/seeds)
+* [Incremental materializations](https://docs.getdbt.com/docs/build/incremental-models) (with `incremental_strategy='delete+insert'` and without involving [OBJECT](https://crate.io/docs/crate/reference/en/5.4/general/ddl/data-types.html#objects) columns)
+
+We hope you find this useful. CrateDB is continuously adding new features and we will endeavor to come back and update this article if there are any developments and some of these overrides require changes or become obsolete.
+
+
+## Appendix
+
+A few notes about advanced configuration options and general usage
+information.
+
+### CrateDB's Differences
+- CrateDB’s fixed catalog name is `crate`, the default schema name is `doc`.
+- CrateDB does not implement the notion of a database, however tables can be created in different [schemas](https://cratedb.com/docs/crate/reference/en/latest/general/ddl/create-table.html#ddl-create-table-schemas).
+- When asked for a database name, specifying a schema name (any), or the fixed catalog name `crate` may be applicable.
+- If a database-/schema-name is omitted while connecting, the PostgreSQL drivers may default to the “username”.
+- The predefined [superuser](https://cratedb.com/docs/crate/reference/en/latest/admin/user-management.html#administration-user-management) on an unconfigured CrateDB cluster is called `crate`, defined without a password.
+- For authenticating properly, please learn about the available [authentication](https://cratedb.com/docs/crate/reference/en/latest/admin/auth/index.html#admin-auth) options.
+
+-- https://cratedb.com/docs/crate/clients-tools/en/latest/connect/#configure
+
+### Connection Options
+**dbt Profile Configuration:** CrateDB targets should be set up using the
+following configuration in your [`profiles.yml`] file.
+```yaml
+company-name:
+  target: dev
+  outputs:
+    dev:
+      type: cratedb
+      host: [clustername].aks1.westeurope.azure.cratedb.net
+      user: [username]
+      password: [password]
+      port: 5432
+      dbname: crate  # CrateDB's only catalog is `crate`.
+      schema: doc    # You can define any schema. `doc` is the default.
+      threads: [optional, 1 or more]
+      [keepalives_idle](#keepalives_idle): 0 # default 0, indicating the system default. See below
+      connect_timeout: 10 # default 10 seconds
+      [retries](#retries): 1  # default 1 retry on error/timeout when opening connections
+      [search_path](#search_path): [optional, override the default postgres search_path]
+      [role](#role): [optional, set the role dbt assumes when executing queries]
+      [sslmode](#sslmode): [optional, set the sslmode used to connect to the database]
+      [sslcert](#sslcert): [optional, set the sslcert to control the certifcate file location]
+      [sslkey](#sslkey): [optional, set the sslkey to control the location of the private key]
+      [sslrootcert](#sslrootcert): [optional, set the sslrootcert config value to a new file path in order to customize the file location that contain root certificates]
+```
+
+### Search Path
+The `search_path` config controls the CrateDB "search path" that dbt configures
+when opening new connections to the database. By default, the CrateDB search
+path is `"doc"`, meaning that unqualified <Term id="table" /> names will be
+searched for in the `doc` schema.
+
+### Custom Schemas
+By default, dbt writes the models into the schema you configured in your
+profile, but in some dbt projects you may need to write data into different
+target schemas. You can adjust the target schema using [custom schemas with
+dbt].
+
+If your dbt project has a custom macro called `generate_schema_name`, dbt
+will use it instead of the default macro. This allows you to customize
+the name generation according to your needs.
+
+```jinja
+{% macro generate_schema_name(custom_schema_name, node) -%}
+  {%- set default_schema = target.schema -%}
+  {%- if custom_schema_name is none -%}
+    {{ default_schema }}
+  {%- else -%}
+    {{ custom_schema_name | trim }}
+  {%- endif -%}
+{%- endmacro %}
+```
+
+
+[dbt]: https://www.getdbt.com/
+[dbt-core]: https://github.com/dbt-labs/dbt-core
+[dbt-project-structure]: https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview