-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2294] [Feature] Seed JSON and Parquet #7155
Comments
Hey @jpmmcneill, always good to see you again! JSON seedsThere's actually a pre-existing issue for the JSON part: #2365. Specifically, the NDJSON format was proposed with one valid JSON value per line (which will most commonly be an object or array). Since the issue you submitted is primarily concerned with JSON seeds, I'm going to close this as a duplicate of #2365 Parquet seedsBut it sounds like you might have a stand-alone (but related) request to be able to seed Parquet files too. My initial (and not fully refined thoughts) related to adding official support for Parquet as a seed format is: not at this time*. The primary reasons:
*With that being said, I would absolutely welcome and encourage you to open a new feature request for Parquet seeds if you want to discuss further! Parquet in DuckDBLet's assume you have a valid parquet file located at
This syntax obviously isn't a true seed, because it wouldn't support references like |
Hey @dbeatty10, thanks for this! No problem to close. I agree that supporting So I wasn't really talking about the duckdb adapter specifically! Indeed, adding duckdb as a requirement for dbt core would give josh a tonne of dependency headaches most likely! 😂 Nice that JSON issue exists. Sorry that I missed that issue, I'll give it a read. |
Yeah, that's a clever idea of using DuckDB as lightweight format converter 🧠 Two things that would improve that approach:
|
Is this your first time submitting a feature request?
Describe the feature
Right now (as far as I am aware) seed files have to be CSVs.
Using something like duckdb (which critically is dependency-less) seeds could easily be extended to json or parquet as well, by having something like:
->
From which the usual seed method could be created. Finally dbt could delete the temp file.
Possibly it's silly to widen the net of things that could be seeded 🤷
Describe alternatives you've considered
No response
Who will this benefit?
No response
Are you interested in contributing this feature?
Yup, if i'm pointed in the right direction! (I think it's probably
core/dbt/task/seed.py
Anything else?
No response
The text was updated successfully, but these errors were encountered: