Replies: 4 comments 3 replies
-
Interesting use case. Have you tried using the |
Beta Was this translation helpful? Give feedback.
-
Thanks @wlandau . I'll switch to |
Beta Was this translation helpful? Give feedback.
-
Ok, There were some issues, mostly to do with Not sure if this is better or worse, but good to know. I'll try |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
At present, we can write
_targets/objects
out to an s3 bucket. Each of the objects will be a file in a_targets
prefix in the bucket.Now, AWS S3 supports SQL select operations on objects under particular circumstances. PrestoDB and Trino also support SQL querying of S3 objects. Two important characteristics are -
My thinking is that, whether by design or not,
targets
is quite close to enabling SQL querying of the tabular objects of its entire processing pipeline. If one can write to an s3 bucket as a prefix per object and in a supported file format, one can build downstream tools that make use of SQL queries of those objects for dashboards or derivative models.This could be really powerful when coupled with presto and apache superset, for instance.
I have verified that I can write objects to prefixes by simply giving the objects prefix-like names (ie instead of calling an object
object1
I call itobject1/data
. It's awkward but it worksI have not been able to hack the file format to write csv or parquet (my preference is parquet). Before I go down that route, which will probably involve some ugly multi-stage targets per object, I thought I'd ask if this seems like a good idea and, if so, could we bake it in to the library natively?
It basically boils down to supporting
aws_csv
,aws_parquet
and/oraws_json
. I like parquet most of all because it preserves data types across languages and is more performant, butaws_csv
should be much easier to implement (no additional libs etc).Beta Was this translation helpful? Give feedback.
All reactions