-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1769] [Spike] Parsing sans adapter #6549
Comments
notes from talking to @gshank Right now the adapter is "all one big ball of wax" :) Longer-term — how could we clearly lock down the set of adapter-specific information that's available at parse time? Is it desirable to move toward a world where parsing is completely adapter-agnostic, if it means losing things too (e.g. the parse-time availability of As a step on the way there — could we imagine splitting the adapter up into "parse-time" adapter and "run-time" adapter? The "parse-time" adapter should be installable without third-party dependencies. It would contain just:
|
Goal of this spike is to estimate the level of effort for actually doing this work, and to identify potential gotchas. |
IMHO it is okay to make that step on the way the final destination, with the idea that we don't import those methods from the actual third party dependency, we can just create a fake one for parsing time |
Big idea: It should be possible to
dbt parse
and produce a Manifest without requiring a specific database adapter to be installed. During parsing, dbt isn't actually connecting to any databases (or the Internet at all) — so it feels silly that the adapter plugin is needed.In this way, parsing is different from compilation (
dbt compile
), wherein dbt actually does need to run some introspective queries in order to properly template model code, especially if that model Jinja-SQL is dynamically templated based on the result of a query (e.g.dbt_utils.get_column_values
).This is a
spike
because I'm actually not convinced that it's possible! If it is, though, it would unlock two big things for us:Potential gotchas
I can think of at least
fourfive types of information that are used at parse time and require the adapter today. There might be more. I've started with the ones that are most concerning.adapter.dispatch
adapter.custom_method()
QuotePolicy
+quoting
configProfile
1. Calls to
adapter.dispatch
This requires a full registry of all the macros defined in all installed adapters.
Two use cases:
depends_on.macros
→ support custom generic tests +state:modified.macros
Example:
-- macros/some_macro.sql {% macro postgres__some_macro() %} {{ return(true) }} {% endmacro %} {% macro default__some_macro() %} {{ return(false) }} {% endmacro %}
2. Parsing user-space Jinja code containing calls to
adapter.custom_method()
Because we actually render complex Jinja at parse time, in order to capture calls to
ref
/source
/config
(+ macro dependencies!), we'd need a way to bypass thoseadapter.*
calls. (A "dummy" adapter class that has any method attribute, accepting any arguments?) So long as those methods aren't actually used to resolveref
/source
/config
, we should be in okay shape.I still think we could land ourselves in type hell, though. What if
custom_method
returns a specific type, which is then important for subsequent logic?The way we've solved for this to date is via the
@available.parse
decorator on adapter methods, which stubs out an empty return value with the correct type. But how could/would we get those types, if not by having access to the adapter / those method definitions at parse time?3.
quoting
confighttps://docs.getdbt.com/reference/project-configs/quoting
To figure out whether the
database.schema.identifier
for a given resource should be quoted, and how to do it if so, we look to a combination of user-supplied configuration (indbt_project.yml
), and thequoting_policy
+quote_character
defined in the adapter plugin'sRelation
class. (For example,dbt-snowflake
has quoting disabled by default, anddbt-bigquery
uses a backtick instead of"
as its quoting character.)We could move the resolution of quoting later, after parsing and during compilation, when we'd need to have the adapter plugin available. E.g. The
relation_name
property has been resolved at compile time in the past, although we just moved it up to parse time in #6427.4. Validation of
target
infoWe need to load the relevant profile, and build the
target
context variable, in order to resolve logic like this at parse time, which may change the shape of the DAG:However, the dataclass to properly validate the credentials lives in the adapter plugin. It also includes default values of
target
attributes which the user may not have specified inprofiles.yml
, and which is exposed to the{{ target }}
context var via explicit inclusion in the_connection_keys
method.For example, take the
client_session_keepalive
boolean config indbt-snowflake
, which isFalse
by default. If the user doesn't specify it inprofiles.yml
, how can we successfully parse this without access to the adapter?(This is a silly example—there's no reason for my model be enabled/disabled depending on that config in particular—but I include it to prove a point about what's technically possible.)
5. Validation of adapter-specific configs
There are "adapter-specific" model configs that, in theory, use a dataclass defined in the adapter plugin for their validation (+ default values if not specified). I'm pretty sure these don't work at all today. From #5236:
The text was updated successfully, but these errors were encountered: