Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBT relationship test does not correctly support column quoting #2468

Closed
4 tasks
MarcelvanVliet opened this issue May 19, 2020 · 14 comments
Closed
4 tasks

DBT relationship test does not correctly support column quoting #2468

MarcelvanVliet opened this issue May 19, 2020 · 14 comments
Labels
bug Something isn't working dbt tests Issues related to built-in dbt testing functionality good_first_issue Straightforward + self-contained changes, good for new contributors! stale Issues that have gone stale

Comments

@MarcelvanVliet
Copy link

Describe the bug

When defining a relationship test (In this case between Person and Invitation) the generated DBT test does not link the relationship back to the destination table column definition. Hence, it does not recognise that a column in a relationship test may require quoting.

Steps To Reproduce

With a Source as follows:
image

The following test is generated, incorrectly leaving the quotes off the column named Id in the generated sql test.
image

The following error occurs:
image

Expected behavior

I would expect DBT to link the column with the definition and respect the quoting setting chosen.
For convenience, it may be useful to also add column quoting configuration at the model and source level.

Screenshots and log output

As above

System information

Which database are you using dbt with?

  • [ x] postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.16.1
   latest version: 0.16.1

The operating system you're using:
Mac
The output of python --version:
Python 3.7.3

Additional context

A valid work-around (via Drew) is to quote the Id column manually in the relationships test. You’ll need to use two sets of quotes — one to denote that the value is a string (for yaml) and the next to be passed into the SQL query.
image

@MarcelvanVliet MarcelvanVliet added bug Something isn't working triage labels May 19, 2020
@MarcelvanVliet
Copy link
Author

I followed the thread to here before losing where the relationship test may be held:
https://github.com/fishtown-analytics/dbt/blob/dev/0.16.1/core/dbt/parser/schemas.py#L236-L273

@jtcohen6
Copy link
Contributor

Thanks for the detailed write-up @MarcelvanVliet!

I would expect DBT to link the column with the definition and respect the quoting setting chosen.
For convenience, it may be useful to also add column quoting configuration at the model and source level.

Currently, dbt supports project-level quoting configurations for relation namespaces: database, schema, and identifier. (There are specially scoped overrides for sources and seeds.)

There isn't a broad-purpose quoting configuration for columns; we tend to accomplish this in utility macros via adapter.quote, or by accessing the quoted attribute of get_columns_in_relation. We could add one; it would be a nontrivial amount of work to synchronize behavior across those macros, and quoting is always trickier than expected and especially liable to database idiosyncrasies.

We'd need to establish what exactly such a lift would get us, since (as you mention) there are workarounds available, albeit a bit funny looking.

@jtcohen6 jtcohen6 removed the triage label May 19, 2020
@drewbanin
Copy link
Contributor

@jtcohen6 I caught up with @MarcelvanVliet about this on Slack. I think it would be really great if we could leverage the quoting configs from these .yml files directly in the relationships test. I bet it would be a pretty big change two save exactly 4 keystrokes, but dbt should have all the info it needs to do the right thing here automatically!

So, this probably isn't something we'd hit in the immediate future, but I'd like to keep this issue open to track the change :)

@jtcohen6
Copy link
Contributor

Ah ok! That makes sense. I missed it in my first read-through.

@jtcohen6 jtcohen6 added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Sep 9, 2020
@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 9, 2020

I just tagged this a good first issue, as I think it'd be a fairly straightforward change. Hint: there's some prior work in #2733 that may be helpful :)

@jordan8037310
Copy link

Noting that this doesn't just fail with relationships requiring quoting, but fails in general with any sources in Postgres.

@jtcohen6
Copy link
Contributor

@jordan8037310 Could you share the specific code snippet + error you're seeing? Not sure I follow fully

@jordan8037310
Copy link

When running unique or not_null tests, the tests fail if there are column names e.g. "ipAddress".

Noting that the workaround does work, '"ipAddress"', but I think the scope of the testing for this issue should be expanded to include unique, not_null, and all of the other tests (not just relationship test as indicated in the issue title).

Error

Database Error in test source_unique_public_source_ip_whois_ipAddress (models/sources.schema.yml)
  column "ipaddress" does not exist
  LINE 10:         ipAddress
                   ^
  HINT:  Perhaps you meant to reference the column "source_ip_whois.ipAddress".

Copy of generated sql for the test:

select count(*) as validation_errors
from (

    select
        ipAddress

    from "facetlgp"."public"."source_ip_whois"
    where ipAddress is not null
    group by ipAddress
    having count(*) > 1

) validation_errors

@jtcohen6
Copy link
Contributor

@jordan8037310 Did you specify quote: true for ipAddress in your YAML file like:

sources:
  - name: public
    table: source_ip_whois
    columns:
      - name: ipAddress
        quote: true
        tests:
          - unique

?

@jordan8037310
Copy link

jordan8037310 commented Sep 15, 2020

@jtcohen6 - no, althought I just tested that and it works! Thanks.

Here are some things I tried. I mainly assumed that if I did applied the settings at the dbt_project.yml level it persisted down.

I tried putting it in at the dbt_project.yml for quoting with the following and it quoted everything in my project... except the sources.

quoting:
      schema: true
      identifier: true

After happening upon this thread, I saw that sources were handled differently so I tried:

# sources.schema.yml

version: 2

sources:
  - name: public
    schema: public
    loaded_at_field: '"updatedAt"'
    loader: serverless_framework
    quoting:
      schema: true
      identifier: true

To be honest I never once saw the quote parameter specified once in documentation when searching google "dbt sources quoting":

So perhaps this is more an issue of a hidden setting for me and documentation about quoting should be updated?

EDIT: Added another link where quote is not mentioned as opposed to quoting

@jtcohen6
Copy link
Contributor

Really fair point. I guess I could've hoped that you would've magically stumbled upon https://docs.getdbt.com/reference/resource-properties/quote/.

Why do we call it quoting for database/schema/identifier names, but then quote for column names? Adding this to my list of nomenclature we might make more consistent in advance of dbt v1.0.

On page 1 of google this links to a 404 and should probably be redirected: https://dbt.readme.io/v0.10/docs/configuring-quoting

Oof, this is old.

@jordan8037310
Copy link

@jtcohen6 +1 normalization of the property names would have made this easier to search and find.

@beckjake
Copy link
Contributor

There's also quote_columns, which is a seed-only config item that lives on its own level but surely belongs inside the quoting config item (or will it be quote?!) as columns 😬

@jtcohen6 jtcohen6 added this to the Oh-Twenty [TBD] milestone Dec 31, 2020
@jtcohen6 jtcohen6 added the dbt tests Issues related to built-in dbt testing functionality label Dec 31, 2020
@jtcohen6 jtcohen6 removed this from the Margaret Mead milestone Apr 13, 2021
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dbt tests Issues related to built-in dbt testing functionality good_first_issue Straightforward + self-contained changes, good for new contributors! stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

5 participants