Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to adjust models? #43

Closed
nbenezri opened this issue Sep 13, 2022 · 18 comments
Closed

How to adjust models? #43

nbenezri opened this issue Sep 13, 2022 · 18 comments

Comments

@nbenezri
Copy link

After changing the metadata variables I saw not all columns are showing in the destination transformation schema. So I went and edit them myself in dbt_packages/stripe/models. however it did not do anything, not even with --full-refresh. I guess this is somewhat related to dbt_packages being in .gitignore.
Am I allowed to change it? if I change it - will it work?

I don't want to break stuff.

The changes I've done to the dbt_project.yml are:

  • stripe__charge_metadata: ['extras', 'product', 'userid']
  • stripe__invoice_metadata: ['extras', 'product', 'userid']
  • stripe__subscription_metadata: ['extras', 'userid']
  • stripe__customer_metadata: ['userid']
@fivetran-joemarkiewicz
Copy link
Contributor

Hi @nbenezri would you be able to share a snippet of your variables in your root dbt_project.

Defining the variables the way you have shown above should work. I would be curious if maybe there is an error with how the variables are defined within your project.

@nbenezri
Copy link
Author

nbenezri commented Sep 13, 2022

In stripe__customer_overview I see the additional column. But for the rest, I do not see.

[nir@dba-dev fivetran_dbt]$ cat dbt_project.yml 

# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'fivetran_dbt'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'fivetran_dbt'

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
  - "target"
  - "dbt_packages"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
  fivetran_dbt:
  zendesk:
    intermediate:
    sla_policy:
    ticket_history:
  zendesk_source:
  jira:  
  jira_source:
  stripe:
  stripe_source:
  
vars:
  using_schedules: False
  jira_database: dkall
  jira_schema: jira
  #jira:
  #  issue_field_history_columns: ['the', 'list', 'of', 'field', 'names']
  stripe_source:
    stripe_database: dkall
    stripe_schema: stripe
  stripe_database: dkall
  stripe_schema: stripe
  stripe__subscription_history: True
  stripe_source:
    using_invoice_line_sub_filter: false
  stripe__charge_metadata: ['extras', 'product', 'userid'] # Note: this is case-SENSITIVE and must match the casing of the property as it appears in the JSON
  stripe__invoice_metadata: ['extras', 'product', 'userid'] 
  #stripe__payment_intent_metadata: ['the', 'list', 'of', 'property', 'fields'] 
  #stripe__payment_method_metadata: ['the', 'list', 'of', 'property', 'fields'] 
  #stripe__payout_metadata: ['the', 'list', 'of', 'property', 'fields'] 
  #stripe__plan_metadata: ['the', 'list', 'of', 'property', 'fields'] 
  #stripe__refund_metadata: ['the', 'list', 'of', 'property', 'fields'] 
  stripe__subscription_metadata: ['extras', 'userid'] 
  stripe__customer_metadata: ['userid'] 
  zendesk_source:
    zendesk_database: dkall
    zendesk_schema: zendesk
  zendesk:
    ticket_field_history_timeframe_years: 1

[nir@dba-dev fivetran_dbt]$ cat packages.yml 
packages:
  - package: fivetran/zendesk_source
    version: [">=0.7.0", "<0.8.0"]
  - package: fivetran/zendesk
    version: [">=0.9.0", "<0.10.0"]
  - package: fivetran/jira_source
    version: [">=0.4.0", "<0.5.0"]
  - package: fivetran/jira
    version: [">=0.8.0", "<0.9.0"]
  - package: fivetran/stripe_source
    version: [">=0.7.0", "<0.8.0"]
  - package: fivetran/stripe
    version: [">=0.7.0", "<0.8.0"]

@fivetran-joemarkiewicz
Copy link
Contributor

Thanks so much for sharing these details @nbenezri. The declaration of these variables seems accurate 🤔

I am especially curious since you mentioned the stripe__customer_metadata: ['userid'] works as intended. Are you able to confirm that for the other metadata variables that the fields are included in the JSON object within the source?

@nbenezri
Copy link
Author

nbenezri commented Sep 13, 2022

They are. and they are appearing in the staging schema.

dkall=# \d cdp_stg_stripe.stg_stripe__subscription
        Table "cdp_stg_stripe.stg_stripe__subscription"
         Column          |           Type           | Modifiers 
-------------------------+--------------------------+-----------
 subscription_id         | character varying(256)   | 
 status                  | character varying(256)   | 
 billing                 | character varying(256)   | 
 billing_cycle_anchor    | timestamp with time zone | 
 cancel_at               | timestamp with time zone | 
 is_cancel_at_period_end | boolean                  | 
 canceled_at             | timestamp with time zone | 
 created_at              | timestamp with time zone | 
 current_period_start    | timestamp with time zone | 
 current_period_end      | timestamp with time zone | 
 customer_id             | character varying(256)   | 
 days_until_due          | bigint                   | 
 metadata                | character varying(256)   | 
 start_date              | timestamp with time zone | 
 ended_at                | timestamp with time zone | 
 extras                  | character varying(256)   | 
 userid                  | character varying(256)   | 

dkall=# \d cdp_stripe.stripe__subscription_details
            Table "cdp_stripe.stripe__subscription_details"
             Column             |           Type           | Modifiers 
--------------------------------+--------------------------+-----------
 subscription_id                | character varying(256)   | 
 customer_id                    | character varying(256)   | 
 customer_description           | character varying(256)   | 
 customer_email                 | character varying(256)   | 
 status                         | character varying(256)   | 
 start_date                     | timestamp with time zone | 
 ended_at                       | timestamp with time zone | 
 billing                        | character varying(256)   | 
 billing_cycle_anchor           | timestamp with time zone | 
 canceled_at                    | timestamp with time zone | 
 created_at                     | timestamp with time zone | 
 current_period_start           | timestamp with time zone | 
 current_period_end             | timestamp with time zone | 
 days_until_due                 | bigint                   | 
 is_cancel_at_period_end        | boolean                  | 
 cancel_at                      | timestamp with time zone | 
 number_invoices_generated      | bigint                   | 
 total_amount_billed            | bigint                   | 
 total_amount_paid              | bigint                   | 
 total_amount_remaining         | bigint                   | 
 most_recent_invoice_created_at | timestamp with time zone | 
 average_invoice_amount         | bigint                   | 
 average_line_item_amount       | bigint                   | 
 avg_num_invoice_items          | bigint                   | 

dkall=# \d cdp_stripe.stripe__subscription_line_items
       Table "cdp_stripe.stripe__subscription_line_items"
         Column          |           Type           | Modifiers 
-------------------------+--------------------------+-----------
 invoice_id              | character varying(256)   | 
 number                  | character varying(256)   | 
 invoice_created_at      | timestamp with time zone | 
 status                  | character varying(256)   | 
 due_date                | timestamp with time zone | 
 amount_due              | bigint                   | 
 subtotal                | bigint                   | 
 tax                     | bigint                   | 
 total                   | bigint                   | 
 amount_paid             | bigint                   | 
 amount_remaining        | bigint                   | 
 attempt_count           | bigint                   | 
 invoice_memo            | character varying(256)   | 
 invoice_line_item_id    | character varying(256)   | 
 line_item_desc          | character varying(256)   | 
 line_item_amount        | bigint                   | 
 quantity                | bigint                   | 
 period_start            | timestamp with time zone | 
 period_end              | timestamp with time zone | 
 balance_transaction_id  | character varying(256)   | 
 charge_amount           | bigint                   | 
 charge_status           | character varying(256)   | 
 charge_created_at       | timestamp with time zone | 
 customer_description    | character varying(256)   | 
 customer_email          | character varying(256)   | 
 customer_id             | character varying(256)   | 
 subscription_id         | character varying(256)   | 
 subscription_billing    | character varying(256)   | 
 subscription_start_date | timestamp with time zone | 
 subscription_ended_at   | timestamp with time zone | 
 plan_id                 | character varying(256)   | 
 plan_is_active          | boolean                  | 
 plan_amount             | bigint                   | 
 plan_interval           | character varying(256)   | 
 plan_interval_count     | bigint                   | 
 plan_nickname           | character varying(256)   | 
 plan_product_id         | character varying(256)   | 

@fivetran-joemarkiewicz
Copy link
Contributor

Hi @nbenezri my apologies as I misread your initial question. I was under the assumption you were unable to replicate the metadata fields in the staging models.

Currently the package only brings the customer metadata fields through to the final model. The other metadata fields are restricted to the staging models. If you are looking to bring other metadata fields through to the final models we can use this issue as a feature and plan to incorporate them in a future release.

For the time being, if you wanted to bring in the metadata fields prior to our feature update I would recommend either one of the following options:

  • Create your own model that references the end model produced by the package and join in the metadata fields from the staging model.
  • Disable the package model and recreate it in your own project with you metadata fields included.
    • To disable the model you can use the below config.
    • You will then create a model with the same name (stripe__invoice_line_items) in your root dbt project and copy/paste the sql used and edit it to your liking.
models:
  stripe:
     stripe__invoice_line_items:
        +enabled: false

You are correct that since the package contents are stored in the dbt_packages folder, you are unable to edit them directly. However, you can use one of the options above to customize them to your liking. I hope this helps!

@nbenezri
Copy link
Author

nbenezri commented Sep 14, 2022

Is there a more detailed walkthrough on how to recreate/fork a model to the side?

EDIT I will be more specific, working on the customers model:

  1. I've disabled the customers model
  2. I've copied the SQL to models/
  3. changed var('customers') to ref('stg_stripe__customer')

I now get

09:23:08  1 of 1 ERROR creating view model cdp.stripe__customer_overview ................. [ERROR in 0.04s]
09:23:08  
09:23:08  Finished running 1 view model, 1 hook in 0 hours 0 minutes and 1.69 seconds (1.69s).
09:23:08  
09:23:08  Completed with 1 error and 0 warnings:
09:23:08  
09:23:08  Compilation Error in model stripe__customer_overview (models/stripe__customer_overview.sql)
09:23:08    'date_timezone' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".

A few questions:

  1. how to make those packages available to models/?
    A: [nir@dba-dev stripe]$ cp macros/date_timezone.sql ../../macros/
  2. how to make variables specific to a certain model? because I would like to copy dbt_packages/stripe/dbt_project.yml I'm my main one so all the SQL I copy will work, and so they won't coliade with other packages/models.
  3. how to change the target materialized from a view to a table?
  4. how to change the target schema?

copying the dbt_project from dbt_packages doesn't work...

  The schema file at models/stripe/dbt_project.yml is invalid because the value of 'models' is not a list

@fivetran-joemarkiewicz
Copy link
Contributor

fivetran-joemarkiewicz commented Sep 14, 2022

Hi @nbenezri if you wanted to fork the repo and make your changes directly in your own fork that may be easiest. You can then install your fork (with your custom changes) instead of our package. You would use the following syntax in your packages.yml to install your forked version of the package.

packages:
  - git: https://github.com/nbenezri/dbt_stripe.git
    revision: custom-branch
    warn-unpinned: false

This way you can make whatever changes you want directly in your own codebase and it will still be installed as a package. This will allow you to circumvent the need for disabling and copy/pasting which can result in a bunch of headache and compilation errors as dbt has a bunch of rules with disabling and having models with the same name.

You make a great point though that it would be helpful for us to possibly put together a guide for customers on how to fork or edit package code if needed.

@nbenezri
Copy link
Author

Work like a charm! thanks!
If you want I can share the dbt modifications I did.

@fivetran-joemarkiewicz
Copy link
Contributor

Brilliant! Yeah if you wanted to share your fork I would be happy to take a look.

We have also had a few individuals in the past open PRs from their forks to contribute the code back into the package. I can look through your updates and if they make sense for the larger user base, we can work to incorporate it!

@nbenezri
Copy link
Author

Archive.zip

Repo is private - can't share it. But here are the files.

@nbenezri
Copy link
Author

BTW I know this is not your field anymore, but I push this change into master and the schedule in fivetran transformation is acting up.
Getting:

Execution Failure. dbt packages failed to install. The 'dbt deps' command had the following execution result: 
Command exit code: 2
stderr: 
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/git.py", line 65, in clone
    result = run_cmd(cwd, clone_cmd, env={"LC_ALL": "C"})
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/system.py", line 435, in run_cmd
    raise dbt.exceptions.CommandResultError(cwd, cmd, proc.returncode, out, err)
dbt.exceptions.CommandResultError: Got a non-zero returncode running: ['/usr/bin/git', 'clone', '--depth', '1', 'git@bitbucket.org:distrokid/fivetran-dbt-stripe.git', '66682aa22add127acad34de7b8f7fbf5']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 129, in main
    results, succeeded = handle_and_check(args)
  File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 192, in handle_and_check
    task, res = run_from_args(parsed)
  File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 239, in run_from_args
    results = task.run()
  File "/usr/local/lib/python3.8/dist-packages/dbt/task/deps.py", line 56, in run
    final_deps = resolve_packages(packages, self.config)
  File "/usr/local/lib/python3.8/dist-packages/dbt/deps/resolver.py", line 131, in resolve_packages
    target = final[package].resolved().fetch_metadata(config, renderer)
  File "/usr/local/lib/python3.8/dist-packages/dbt/deps/base.py", line 86, in fetch_metadata
    self._cached_metadata = self._fetch_metadata(project, renderer)
  File "/usr/local/lib/python3.8/dist-packages/dbt/deps/git.py", line 93, in _fetch_metadata
    path = self._checkout()
  File "/usr/local/lib/python3.8/dist-packages/dbt/deps/git.py", line 79, in _checkout
    dir_ = git.clone_and_checkout(
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/git.py", line 137, in clone_and_checkout
    _, err = clone(
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/git.py", line 67, in clone
    _raise_git_cloning_error(repo, revision, exc)
  File "/usr/local/lib/python3.8/dist-packages/dbt/clients/git.py", line 37, in _raise_git_cloning_error
    bad_package_spec(repo, revision, stderr)
  File "/usr/local/lib/python3.8/dist-packages/dbt/exceptions.py", line 735, in bad_package_spec
    raise InternalException(scrub_secrets(msg, env_secrets()))
dbt.exceptions.InternalException: Error checking out spec='None' for repo git@bitbucket.org:distrokid/fivetran-dbt-stripe.git
Cloning into '66682aa22add127acad34de7b8f7fbf5'...
fatal: unable to fork
stdout: 
16:27:41  Running with dbt=1.2.0
16:27:42  Encountered an error:
Error checking out spec='None' for repo git@bitbucket.org:distrokid/fivetran-dbt-stripe.git
Cloning into '66682aa22add127acad34de7b8f7fbf5'...
fatal: unable to fork

If you have any idea why it unable to fork it...

@fivetran-joemarkiewicz
Copy link
Contributor

Hey @nbenezri no worries at all. Hopefully I can help.

What are the contents of your packages.yml file?

@nbenezri
Copy link
Author

nbenezri commented Sep 16, 2022

packages:
  - package: fivetran/zendesk_source
    version: [">=0.7.0", "<0.8.0"]
  - package: fivetran/zendesk
    version: [">=0.9.0", "<0.10.0"]
  - package: fivetran/jira_source
    version: [">=0.4.0", "<0.5.0"]
  - package: fivetran/jira
    version: [">=0.8.0", "<0.9.0"]
  - package: fivetran/stripe_source
    version: [">=0.7.0", "<0.8.0"]
  - git: git@bitbucket.org:<company>/fivetran-dbt-stripe.git
    revision: main
    warn-unpinned: false

It should work as both repo's (fivetran-dbt-stripe, and this one) in the same master repo. So the ssh key should work for both (it works for this one).
If I change to https it request user/pass.

@nbenezri
Copy link
Author

nbenezri commented Sep 20, 2022

Would love some help @fivetran-joemarkiewicz

@fivetran-joemarkiewicz
Copy link
Contributor

@nbenezri do the package installs on dbt deps work locally? Is the only issue when deploying it, or does it fail locally as well?

@nbenezri
Copy link
Author

Locally it works. Actually, until the issue is solved, I use crontab from my server instead of Fivetran's transformation schedule.

@fivetran-joemarkiewicz
Copy link
Contributor

Since it works locally, I imagine this is something that needs to be addressed at the Fivetran Transformations layer. Unfortunately, I am not part of this team but I would recommend opening a Support Ticket (if you have not already) to help resolve this issue.

I apologize I am unable to help address this, but I do believe it has something to do with how Fivetran Transformations talks to other repos.

@fivetran-reneeli
Copy link
Contributor

Closing this issue out as we have now introduced the ability to bring metadata fields into select staging models!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants