Skip to content

Commit

Permalink
Merge branch 'current' into warn-error-options
Browse files Browse the repository at this point in the history
  • Loading branch information
MichelleArk authored Jan 12, 2023
2 parents 87c6f85 + 4c08b92 commit 124a150
Show file tree
Hide file tree
Showing 8 changed files with 330 additions and 10 deletions.
8 changes: 0 additions & 8 deletions website/docs/docs/collaborate/git/pr-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,6 @@ https://github.com/dbt-labs/jaffle_shop/compare/master..my-branch
</TabItem>
</Tabs>

## Configure custom branches

By default in Development Environments, dbt Cloud attempts to reference the `main` branch in connected repositories. If you want to use a different default branch name, you can configure dbt Cloud with a custom branch setting.

For example, you can use the `develop` branch of a connected repository. Edit an environment, then in "General settings" select **Only run on a custom branch** , and in "Custom branch" type **develop** or the name of your custom branch.

<Lightbox src="/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/dev-environment-custom-branch.png" title="Configuring a custom base repository branch"/>

## Example templates

Some common URL templates are provided below, but please note that the exact
Expand Down
26 changes: 26 additions & 0 deletions website/docs/faqs/Environments/custom-branch-settings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: How do I use the `Custom Branch` settings in a dbt Cloud Environment?
description: "Use custom code from your repository"
sidebar_label: 'Custom Branch settings'
id: custom-branch-settings
---

In dbt Cloud environments, you can change your git settings to use a different branch in your dbt project repositories besides the default branch. When you make this change, you run dbt on a custom branch. When specified, dbt Cloud executes models using the custom branch setting for that environment. Development and deployment environments have slightly different effects.

To specify a custom branch:
1. Edit an existing environment or create a new one
2. Select **Only run on a custom branch** under General Settings
3. Specify the **branch name or tag**


## Development

In a development environment, the default branch (commonly the `main` branch) is a read-only branch found in the IDE's connected repositories, which you can use to create development branches. Identifying a custom branch overrides this default behavior. Instead, your custom branch becomes read-only and can be used to create development branches. You will no longer be able to make commits to the custom branch from within the dbt Cloud IDE.

For example, you can use the `develop` branch of a connected repository. Edit an environment, select **Only run on a custom branch** in **General settings** , enter **develop** as the name of your custom branch.

<Lightbox src="/img/docs/dbt-cloud/cloud-configuring-dbt-cloud/dev-environment-custom-branch.png" title="Configuring a custom base repository branch"/>

## Deployment

When running jobs in a deployment environment, dbt will clone your project from your connected repository before executing your models. By default, dbt uses the default branch of your repository (commonly the `main` branch). To specify a different version of your project for dbt to execute during job runs in a particular environment, you can edit the Custom Branch setting as shown in the previous steps.
90 changes: 90 additions & 0 deletions website/docs/sql-reference/clauses/sql-having.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
id: having
title: SQL HAVING
description: Adding the LIMIT clause to a query will limit the number of rows returned.
slug: /sql-reference/having
---

<head>
<title>Working with the HAVING clause in SQL</title>
</head>

SQL HAVING is just one of those little things that are going to make your ad hoc data work a little easier.

A not-so-fun fact about the [WHERE clause](/sql-reference/where) is that you can’t filter on aggregates with it…that’s where HAVING comes in. With HAVING, you can not only define an aggregate in a [select](/sql-reference/select) statement, but also filter on that newly created aggregate within the HAVING clause.

This page will walk through how to use HAVING, when you should use it, and discuss data warehouse support for it.


## How to use the HAVING clause in SQL

The HAVING clause essentially requires one thing: an aggregate field to evaluate. Since HAVING is technically a boolean, it will return rows that execute to true, similar to the WHERE clause.

The HAVING condition is followed after a [GROUP BY statement](/sql-reference/group-by) and optionally enclosed with an ORDER BY statement:

```sql
select
-- query
from <table>
group by <field(s)>
having condition
[optional order by]
```

That example syntax looks a little gibberish without some real fields, so let’s dive into a practical example using HAVING.

### SQL HAVING example

<Tabs
defaultValue="having"
values={[
{ label: 'HAVING example', value: 'having', },
{label: 'CTE example', value: 'cte', },
]
}>
<TabItem value="having">

```sql
select
customer_id,
count(order_id) as num_orders
from {{ ref('orders') }}
group by 1
having num_orders > 1 --if you replace this with `where`, this query would not successfully run
```
</TabItem>
<TabItem value="cte">

```sql
with counts as (
select
customer_id,
count(order_id) as num_orders
from {{ ref('orders') }}
group by 1
)
select
customer_id,
num_orders
from counts
where num_orders > 1
```

</TabItem>
</Tabs>

This simple query using the sample dataset [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return customers who have had more than one order:

| customer_id | num_orders |
|:---:|:---:|
| 1 | 2 |
| 3 | 3 |
| 94 | 2 |
| 64 | 2 |
| 54 | 4 |

The query above using the <Term id="cte" /> utilizes more lines compared to the simpler query using HAVING, but will produce the same result.

## SQL HAVING clause syntax in Snowflake, Databricks, BigQuery, and Redshift

[Snowflake](https://docs.snowflake.com/en/sql-reference/constructs/having.html), [Databricks](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-having.html), [BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#having_clause), and [Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_HAVING_clause.html) all support the HAVING clause and the syntax for using HAVING is the same across each of those data warehouses.
70 changes: 70 additions & 0 deletions website/docs/sql-reference/joins/sql-inner-join.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
id: inner-join
title: SQL INNER JOINS
description: The ORDER BY clause allows you to specify the resulting row order for a query.
slug: /sql-reference/inner-join
---

<head>
<title>Working with inner joins in SQL</title>
</head>

The cleanest and easiest of SQL joins: the humble inner join. Just as its name suggests, an inner join between two database objects returns all rows that have matching join keys; any keys that don’t match are omitted from the query result.

## How to create an inner join

Like all joins, you need some database objects (ie <Term id="table">tables</Term>/<Term id="view">views</Term>), keys to join on, and a [select statement](/sql-reference/select) to perform an inner join:

```
select
<fields>
from <table_1> as t1
inner join <table_2> as t2
on t1.id = t2.id
```

In this example above, there’s only one field from each table being used to join the two together; if you’re joining between two database objects that require multiple fields, you can leverage AND/OR operators, and more preferably, <Term id="surrogate-key">surrogate keys</Term>. You may additionally add [WHERE](/sql-reference/where), [GROUP BY](/sql-reference/group-by), [ORDER BY](/sql-reference/order-by), [HAVING](/sql-reference/having), and other clauses after your joins to create filtering, ordering, and performing aggregations.

As with any query, you can perform as many joins as you want in a singular query. A general word of advice: try to keep data models <Term id="dry">modular</Term> by performing regular <Term id="dag" /> audits. If you join certain tables further upstream, are those individual tables needed again further downstream? If your query involves multiple joins and complex logic and is exposed to end business users, ensure that you leverage table or [incremental materializations](https://docs.getdbt.com/docs/build/incremental-models).

### SQL inner join example

Table A `car_type`

| user_id | car_type |
|:---:|:---:|
| 1 | van |
| 2 | sedan |
| 3 | truck |

Table B `car_color`

| user_id | car_color |
|:---:|:---:|
| 1 | red |
| 3 | green |
| 4 | yellow |

```sql
select
car_type.user_id as user_id,
car_type.car_type as type,
car_color.car_color as color
from {{ ref('car_type') }} as car_type
inner join {{ ref('car_color') }} as car_color
on car_type.user_id = car_color.user_id
```

This simple query will return all rows that have the same `user_id` in both Table A and Table B:

| user_id | type | color |
|:---:|:---:|:---:|
| 1 | van | red |
| 3 | truck | green |

Because there’s no `user_id` = 4 in Table A and no `user_id` = 2 in Table B, rows with ids 2 and 4 (from either table) are omitted from the inner join query results.

## SQL inner join use cases

There are probably countless scenarios where you’d want to inner join multiple tables together—perhaps you have some really nicely structured tables with the exact same <Term id="primary-key">primary keys</Term> that should really just be one larger, wider table or you’re joining two tables together don’t want any null or missing column values if you used a left or right join—it’s all pretty dependent on your source data and end use cases. Where you will not (and should not) see inner joins is in [staging models](https://docs.getdbt.com/guides/best-practices/how-we-structure/2-staging) that are used to clean and prep raw source data for analytics uses. Any joins in your dbt projects should happen further downstream in [intermediate](https://docs.getdbt.com/guides/best-practices/how-we-structure/3-intermediate) and [mart models](https://docs.getdbt.com/guides/best-practices/how-we-structure/4-marts) to improve modularity and DAG cleanliness.

59 changes: 59 additions & 0 deletions website/docs/sql-reference/operators/sql-any-all.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
id: any-all
title: SQL ANY and ALL
description: The ANY operator will return true if any of the conditions passed into evaluate to true, while ALL will only return true if all conditions passed into it are true.
slug: /sql-reference/any-all
---

<head>
<title>Working with the SQL ANY and ALL operators</title>
</head>

The SQL ANY and ALL operators are useful for evaluating conditions to limit query results; they are often passed in with [LIKE](/sql-reference/like) and [ILIKE](/sql-reference/ilike) operators. The ANY operator will return true if any of the conditions passed into evaluate to true, while ALL will only return true if *all* conditions passed into it are true.

Use this page to better understand how to use ANY and ALL operators, use cases for these operators, and which data warehouses support them.

## How to use the SQL ANY and ALL operators

The ANY and ALL operators have very simple syntax and are often passed in the LIKE/ILIKE operator or <Term id="subquery" />:

`where <field_name> like/ilike any/all (array_of_options)`

`where <field_name> = any/all (subquery)`

Some notes on this operator’s syntax and functionality:
- You may pass in a subquery into the ANY or ALL operator instead of an array of options
- Use the ILIKE operator with ANY or ALL to avoid case sensitivity

Let’s dive into a practical example using the ANY operator now.

### SQL ANY example

```sql
select
order_id,
status
from {{ ref('orders') }}
where status like any ('return%', 'ship%')
```

This simple query using the [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `orders` table will return orders whose status is like the patterns `start with 'return'` or `start with 'ship'`:

| order_id | status |
|:---:|:---:|
| 18 | returned |
| 23 | return_pending |
| 74 | shipped |

Because LIKE is case-sensitive, it would not return results in this query for orders whose status were say `RETURNED` or `SHIPPED`. If you have a mix of uppercase and lowercase strings in your data, consider standardizing casing for strings using the [UPPER](/sql-reference/upper) and [LOWER](/sql-reference/lower) functions or use the more flexible ILIKE operator.

## ANY and ALL syntax in Snowflake, Databricks, BigQuery, and Redshift

Snowflake and Databricks support the ability to use ANY in a LIKE operator. Amazon Redshift and Google BigQuery, however, do not support the use of ANY in a LIKE or ILIKE operator. Use the table below to read more on the documentation for the ANY operator in your data warehouse.

| **Data warehouse** | **ANY support?** | **ALL support?** |
|:---:|:---:|:---:|
| [Snowflake](https://docs.snowflake.com/en/sql-reference/functions/like_any.html) |||
| [Databricks](https://docs.databricks.com/sql/language-manual/functions/like.html) |||
| Amazon Redshift | ❌Not supported; consider utilizing multiple OR clauses or [IN operators](/sql-reference/in). | ❌Not supported; consider utilizing multiple [AND clauses](/sql-reference/and) |
| Google BigQuery | ❌Not supported; consider utilizing [multiple OR clauses](https://stackoverflow.com/questions/54645666/how-to-implement-like-any-in-bigquery-standard-sql) or IN operators. | ❌Not supported; consider utilizing multiple AND clauses |
73 changes: 73 additions & 0 deletions website/docs/sql-reference/statements/sql-group-by.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
id: group-by
title: SQL GROUP BY
description: The GROUP BY statement allows you to group query results by specified columns and is used in pair with aggregate functions such as AVG and SUM to calculate those values across specific rows.
slug: /sql-reference/group-by
---

<head>
<title>Working with the SQL GROUP BY statement</title>
</head>

GROUP BY…it’s a little hard to explicitly define in a way *that actually makes sense*, but it will inevitably show up countless times in analytics work and you’ll need it frequently.

To put it in the simplest terms, the GROUP BY statement allows you to group query results by specified columns and is used in pair with aggregate functions such as [AVG](/sql-reference/avg) and [SUM](/sql-reference/sum) to calculate those values across specific rows.

## How to use the SQL GROUP BY statement

The GROUP BY statement appears at the end of a query, after any joins and [WHERE](/sql-reference/where) filters have been applied:

```sql
select
my_first_field,
count(id) as cnt --or any other aggregate function (sum, avg, etc.)
from my_table
where my_first_field is not null
group by 1 --grouped by my_first_field
order by 1 desc
```

A few things to note about the GROUP BY implementation:
- It’s usually listed as one of the last rows in a query, after any joins or where statements; typically you’ll only see [HAVING](/sql-reference/having), [ORDER BY](/sql-reference/order-by), or [LIMIT](/sql-reference/limit) statements following it in a query
- You can group by multiple fields (ex. `group by 1,2,3`) if you need to; in general, we recommend performing aggregations and joins in separate <Term id="cte">CTEs</Term> to avoid having to group by too many fields in one query or CTE
- You may also group by explicit column name (ex. `group by my_first_field`) or even a manipulated column name that is in the query (ex. `group by date_trunc('month', order_date)`)

:::note Readability over DRYness?
Grouping by explicit column name (versus column number in query) can be two folded: on one hand, it’s potentially more readable by end business users; on the other hand, if a grouped column name changes, that name change needs to be reflected in the group by statement. Use a grouping convention that works for you and your data, but try to keep to one standard style.
:::

### SQL GROUP BY example

```sql
select
customer_id,
count(order_id) as num_orders
from {{ ref('orders') }}
group by 1
order by 1
limit 5
```

This simple query using the sample dataset [Jaffle Shop’s](https://github.com/dbt-labs/jaffle_shop) `order` table will return customers and the count of orders they’ve placed:

| customer_id | num_orders |
|:---:|:---:|
| 1 | 2 |
| 2 | 1 |
| 3 | 3 |
| 6 | 1 |
| 7 | 1 |

Note that the `order by` and `limit` statements are after the `group by` in the query.

## SQL GROUP BY syntax in Snowflake, Databricks, BigQuery, and Redshift

Snowflake, Databricks, BigQuery, and Redshift all support the ability to group by columns and follow the same syntax.

## GROUP BY use cases

Aggregates, aggregates, and did we mention, aggregates? GROUP BY statements are needed when you’re calculating aggregates (averages, sum, counts, etc.) by specific columns; your query will not run successfully without them if you’re attempting to use aggregate functions in your query. You may also see GROUP BY statements used to deduplicate rows or join aggregates onto other tables with <Term id="cte">CTEs</Term>; [this article provides a great writeup](https://www.getdbt.com/blog/write-better-sql-a-defense-of-group-by-1/) on specific areas you might see GROUP BYs used in your dbt projects and data modeling work.

:::tip 👋Bye bye finicky group bys
In some sticky data modeling scenarios, you may find yourself needing to group by many columns to collapse a table down into fewer rows or deduplicate rows. In that scenario, you may find yourself writing `group by 1, 2, 3,.....,n` which can become tedious, confusing, and difficult to troubleshoot. Instead, you can leverage a [dbt macro](https://github.com/dbt-labs/dbt-utils#group_by-source) that will save you from writing `group by 1,2,....,46` to instead a simple `{{ dbt_utils.group_by(46) }}`...you’ll thank us later 😉
:::
2 changes: 1 addition & 1 deletion website/docs/sql-reference/string-functions/sql-lower.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ slug: /sql-reference/lower
</head>

We’ve all been there:
- In a user signup form, user A typed in their name as `Kira Furuich`i, user B typed it in as` john blust`, and user C wrote `DAvid KrevitT` (what’s up with that, David??)
- In a user signup form, user A typed in their name as `Kira Furuichi`, user B typed it in as `john blust`, and user C wrote `DAvid KrevitT` (what’s up with that, David??)
- Your backend application engineers are adamant customer emails are in all caps
- All of your event tracking names are lowercase

Expand Down
12 changes: 11 additions & 1 deletion website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -950,6 +950,7 @@ guides: [
items: [
"sql-reference/statements/select",
"sql-reference/statements/from",
"sql-reference/statements/group-by",
],
},
{
Expand All @@ -969,9 +970,10 @@ guides: [
type: "category",
label: "Clauses",
items: [
"sql-reference/clauses/where",
"sql-reference/clauses/having",
"sql-reference/clauses/limit",
"sql-reference/clauses/order-by",
"sql-reference/clauses/where",
],
},
{
Expand Down Expand Up @@ -1013,6 +1015,14 @@ guides: [
"sql-reference/operators/like",
"sql-reference/operators/and",
"sql-reference/operators/not",
"sql-reference/operators/any-all",
],
},
{
type: "category",
label: "Joins",
items: [
"sql-reference/joins/inner-join",
],
},
{
Expand Down

0 comments on commit 124a150

Please sign in to comment.