feat: query data from S3 location or stage #7211

BohuTANG · 2022-08-20T01:43:44Z

Summary

Make databend as a query engine, query data from S3 location or stage directly.

Refer:
https://docs.snowflake.com/en/user-guide/querying-stage.html

Xuanwo · 2022-08-22T02:06:47Z

The potancial that I can recongenize from this feature:

Query data from dropbox/google drive: we can empower personal/enterprise users without complex infra
Load data in background: Users query as normal but copy data to databend cloud at the same time. Once load are ready, users can query in a more efficient way.

doki23 · 2022-08-22T11:29:31Z

/assignme

doki23 · 2022-08-22T11:30:16Z

Hmm, is /assignme invalid?

BohuTANG · 2022-08-29T06:44:27Z

Load data in background: Users query as normal but copy data to databend cloud at the same time. Once load are ready, users can query in a more efficient way.

There is no COPY here, we can transform the parquet files to fuse engine files directly, for example:

Users can create a table:

CREATE table xx ... location='s3://<user-bucket-path>'  CONNECTION=...

If the location is parquet files and not created by fuse engine, we can query them in normal way:

list all the parquet files
query them without any optimization (Since it does not have fuse indexes)

If the user does some optimization like:

optimize table xx; -- this statement syntax is a demo

We can:

create min/max and other all fuse indexes for the parquet files without loading them
convert all parquet files as the fuse engine files, and store some metadata to metasrv

I think @dantengsky have some ideas on it.

BohuTANG · 2022-11-21T02:17:10Z

Hi @doki23 ,

This feature is related to much databend-query internal mod refractory(Such as planner bind_sql and schema infer), so it's hard to do it now.
So, let's re-assign this issue to @youngsofun , he will start the task and complete the first phase: querying the parquet file from stage/location.

cc @Xuanwo @sundy-li @dantengsky

doki23 · 2022-11-21T02:35:13Z

Get it

BohuTANG added the C-feature Category: feature label Aug 20, 2022

BohuTANG mentioned this issue Aug 20, 2022

Release proposal: Nightly v0.9 #7052

Closed

43 tasks

Xuanwo assigned doki23 Aug 22, 2022

ClSlaid mentioned this issue Aug 25, 2022

Support COPY INTO ... (SELECT fields FROM @stage) #7228

Closed

BohuTANG mentioned this issue Aug 30, 2022

Zero-copy converting for a location with many parquet files to fuse engine table #7381

Open

BohuTANG added the A-query Area: databend query label Aug 30, 2022

This was referenced Sep 6, 2022

allow executing COPY INTO in a cluster #6395

Closed

Tracking: catalog for stage/location special data source #7502

Open

BohuTANG mentioned this issue Nov 8, 2022

refactor(copy): try move list files to read_partitions #8673

Merged

BohuTANG mentioned this issue Nov 21, 2022

Tracking: Databend as Lakehouse #7592

Open

9 tasks

BohuTANG assigned youngsofun and unassigned doki23 Nov 21, 2022

youngsofun mentioned this issue Nov 30, 2022

feat(format): add basic schema infer for parquet. #9043

Merged

sundy-li mentioned this issue Dec 1, 2022

Feature: read_parquet table function #9048

Closed

soyeric128 mentioned this issue Dec 1, 2022

docs(blog): add this week in databend 70 #9056

Merged

BohuTANG mentioned this issue Jan 15, 2023

Release proposal: Nightly v1.0 #9604

Closed

5 tasks

youngsofun closed this as completed Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: query data from S3 location or stage #7211

feat: query data from S3 location or stage #7211

BohuTANG commented Aug 20, 2022

Xuanwo commented Aug 22, 2022

doki23 commented Aug 22, 2022

doki23 commented Aug 22, 2022

BohuTANG commented Aug 29, 2022

BohuTANG commented Nov 21, 2022

doki23 commented Nov 21, 2022

feat: query data from S3 location or stage #7211

feat: query data from S3 location or stage #7211

Comments

BohuTANG commented Aug 20, 2022

Xuanwo commented Aug 22, 2022

doki23 commented Aug 22, 2022

doki23 commented Aug 22, 2022

BohuTANG commented Aug 29, 2022

BohuTANG commented Nov 21, 2022

doki23 commented Nov 21, 2022