Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1375] [Bug] on snowflake "quoting" is not respected in python models #6103

Closed
2 tasks done
pquadri opened this issue Oct 19, 2022 · 2 comments · Fixed by #6620
Closed
2 tasks done

[CT-1375] [Bug] on snowflake "quoting" is not respected in python models #6103

pquadri opened this issue Oct 19, 2022 · 2 comments · Fixed by #6620
Labels
bug Something isn't working python_models

Comments

@pquadri
Copy link
Contributor

pquadri commented Oct 19, 2022

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Using a quoting: true parameter results in the quoting not being respected.

def model(dbt, session):
    print(type(dbt))
    dbt.config(materialized="incremental", unique_key="id")

    leads_created = dbt.source("products_eventstore", "events").filter(
        "\"type\" = 'LeadCreated'"
    )

    leads_df = leads_created["aggregateId"]

    return leads_df

yields this query being executed:

SELECT  *  FROM ( SELECT  *  FROM (PRODUCTS.EVENTSTORE.events)) WHERE "type" = 'LeadCreated'

when the source specifies quoting the identifier.

Expected Behavior

The identifier is correctly quoted

Steps To Reproduce

As above.

Relevant log output

No response

Environment

- OS: OSX 12.5.1
- Python: 3.10.6
- dbt: 1.3.0

Which database adapter are you using with dbt?

snowflake

Additional Context

I think the issue might be the fact that identifier = false is passed explicitly here: https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/python_model/python.sql, removing those defaults solved the problem on my machine.

@pquadri pquadri added bug Something isn't working triage labels Oct 19, 2022
@github-actions github-actions bot changed the title [Bug] on snowflake "quoting" is not respected in python models [CT-1375] [Bug] on snowflake "quoting" is not respected in python models Oct 19, 2022
@jtcohen6 jtcohen6 added python_models Team:Adapters Issues designated for the adapter area of the code labels Oct 20, 2022
@lostmygithubaccount lostmygithubaccount self-assigned this Oct 23, 2022
@pquadri
Copy link
Contributor Author

pquadri commented Nov 3, 2022

can i take this issue if nobody has time to fix it? :D

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 17, 2022

Hey @pquadri - sorry for the delay getting back to you!

cc @ChenyuLInx: Can you remember the reason why you opted for .quote(False) in the initial implementation?

First question: How to address objects with quoted components in Snowpark Python, given that object names already need to be passed into session.table() as quoted strings? Following your example:

create or replace table analytics.jerco_src_schema."$$VerySpecialTable" as (
    select 1 as aggregateId, 'LeadCreated' as "type"
);
version: 2

sources:
  - name: jerco_src_schema
    tables:
      - name: very_special_table
        identifier: "Very%Special&Table"
    quoting:
      database: false
      schema: false
      identifier: true

Does this work in Snowpark Python if we escape the " as \"?

# https://docs.snowflake.com/en/sql-reference/stored-procedures-python.html

import snowflake.snowpark as snowpark

def main(session: snowpark.Session):
    df = session.table("analytics.jerco_src_schema.\"Very%Special&Table\"")
    df.filter(
        "\"type\" = 'LeadCreated'"
    )
    return df.collect()

I tried it out in Snowsight, and it does!

Second question: Does this just work if we remove the .quote(False, False, False) modifications from ref and from source?

Here's a dbt model, similar to yours:

def model(dbt, session):
    # switched from 'incremental' to 'table' for simplicity
    dbt.config(materialized="table")

    leads_created = dbt.source("jerco_src_schema", "very_special_table").filter(
        "\"type\" = 'LeadCreated'"
    )

    # switched this to .select() so as to return a DataFrame, rather than a single Column
    leads_df = leads_created.select("aggregateId")

    return leads_df

When I do dbt run, after making those changes to the core macros, I see it succeed! This is what's included in the compiled model code:

def source(*args, dbt_load_df_function):
    sources = {"jerco_src_schema.very_special_table": "analytics.jerco_src_schema.\"Very%Special\u0026Table\""}
    key = ".".join(args)
    return dbt_load_df_function(sources[key])

Third question: Given that the change here would be to dbt-core, this will affect other adapters as well. Does this same approach to escape-quoting work in other databases? Namely Spark/Databricks + BigQuery/Dataproc, where quoting is on by default, and quotes are backticks instead of escaped quotes?

For example, this works in SparkSQL:

select * from `dbt_jcohen`.`orders`

Does this work in PySpark?

df = spark.table("`dbt_jcohen`.`orders`")
df.describe()

It looks like it does!

I'm going to verify that a simple Python model runs successfully on both data platforms, with the quoting enabled for source + ref. If it looks good, then I think the straightforward change here is indeed the right one!

@jtcohen6 jtcohen6 added Team:Execution and removed triage Team:Adapters Issues designated for the adapter area of the code labels Nov 17, 2022
@jtcohen6 jtcohen6 removed their assignment Nov 18, 2022
jtcohen6 added a commit that referenced this issue Jan 19, 2023
* Respect quoting for 'this' in dbt-py models #6619

* Respect quoting for ref/source in dbt-py models #6103

* Add changelog entries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python_models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants