Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1380] [Feature] Namespaced packages to enable multiple package import #6113

Closed
3 tasks done
dstuck opened this issue Oct 20, 2022 · 5 comments
Closed
3 tasks done
Labels
enhancement New feature or request packages Functionality for interacting with installed packages stale Issues that have gone stale

Comments

@dstuck
Copy link

dstuck commented Oct 20, 2022

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

We would like to import a package multiple times in a single project. An example use case is where you have a common data product that needs to be exported/shared with multiple clients with slightly different config or a project with a common set of sources that branch out into multi-tenant mart with overlapping logic. We would like to be able to define that common product in a package and import it once for each client with client_prefixed schemas.

I believe this could be achieved by allowing a project alias to be configured in the package that would be used as the project_name after loading the package. The main unknown for me is whether the Project class gets loaded from the dbt_project.yml in the package files outside of the deps call in which case the alias would need to be written into that file to replace the project name which is a bit more invasive.

Describe alternatives you've considered

Some alternatives that allow us to be drier than just "copy paste everything"

  • Write macros that generate the models and then copy paste the models multiple times
  • Break out dbt project out into separate projects

Who will this benefit?

This enhancement will benefit teams that support multitenant architectures or create standardized data products from a common source.

Are you interested in contributing this feature?

I would be interested in contributing though haven't before so could use some guidance

Anything else?

No response

@dstuck dstuck added enhancement New feature or request triage labels Oct 20, 2022
@github-actions github-actions bot changed the title [Feature] Namespaced packages to enable multiple package import [CT-1380] [Feature] Namespaced packages to enable multiple package import Oct 20, 2022
@jtcohen6
Copy link
Contributor

@dstuck This is really interesting!

Thanks for talking through the use case. To date, I'm familiar with a few patterns that agencies use to provide the same basic transformation flow to a number of different clients:

  • Combine all client data into a common set of tables in the data platform (lake/warehouse/etc), and perform common transformations on those tables once. In the final step, rigorously separate client data again before showing/sharing. This is by far the simplest in terms of overhead — true multi-tenancy — but understandably does not fly for many folks' data security requirements.
  • Create a common project of models & macros. Then import → reconfigure / set vars → deploy in X separate invocations / super-projects, where X is number of clients. Total control, but not very scalable (or very fun).

Your proposal offers an alternative: What if you could have a common sub-project and a common super-project, while still preserving the one-to-many relationship between them?

I believe this could be achieved by allowing a project alias to be configured in the package that would be used as the project_name after loading the package.

I think you're onto something!

First thing's first — we'd need to fix #1269, and allow for multiple models in the same dbt DAG, with the same name, so long as they exist in different project namespaces. That need is top-of-mind for us; I'd very much like to see us do it over the next several months, as part of the larger initiative discussed in #5244.

Once that's sorted, you could imagine something like:

# packages.yml
packages:
  - local: path/to/reused/project
    project_name: client_a
  - local: path/to/reused/project
    project_name: client_b
# dbt_project.yml
models:
  client_a:  # scoped to this client's data
    custom_config: ...
    vars: ...
  client_b:  # scoped to a different client's data
    custom_config: ...
    vars: ...

From a technical perspective, I think this would take a little bit of doing, starting with where we load projects, and tracing that through to the parsing of each project, to ensure we're forming a truly unique unique_id for each "repeated" node.

@jtcohen6 jtcohen6 added packages Functionality for interacting with installed packages Team:Language and removed triage labels Oct 31, 2022
@dstuck
Copy link
Author

dstuck commented Nov 1, 2022

@jtcohen6 I really appreciate the detailed description of how this relates to ongoing work and it definitely feels like it's a request that needs namespacing to be worked out first. I hadn't actually considered that the file names would lead to collisions (to be honest having model names use the root filename always been one of the least intuitive things about dbt to me) so that's a much bigger issue than just referencing the package.

The other issue I realized while getting into the weeds a bit with dispatching from packages after I posted this is that I think package developers need to include hard-coded references to their project name when referencing macros in the package which could also break my initial thought of "just change the name of the project to an alias".

Very excited about the multi-project support initiative as that would be another approach to work around this issue by using different projects for the multi-tenant marts that could still maintain dependencies with the common data.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 1, 2022

Dispatching! That's a really good point. This pattern might make more sense for "model packages" versus "macro packages," which I'm increasingly convinced are different patterns for code reuse.

@github-actions
Copy link
Contributor

github-actions bot commented May 1, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request packages Functionality for interacting with installed packages stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

2 participants