Zero-copy converting for a location with many parquet files to fuse engine table #7381

BohuTANG · 2022-08-30T04:43:23Z

Load data in background: Users query as normal but copy data to databend cloud at the same time. Once load are ready, users can query in a more efficient way.

There is no COPY here, we can transform the parquet files to fuse engine files directly, for example:

Users can create a table:

CREATE table xx ... location='s3://<user-bucket-path>'  CONNECTION=...

If the location is parquet files and not created by fuse engine, we can query them in normal way:

list all the parquet files
query them without any optimization (Since it does not have fuse indexes)

If the user does some optimization like:

optimize table xx; -- this statement syntax is a demo

We can:

create min/max and other all fuse indexes for the parquet files without loading them
convert all parquet files as the fuse engine files, and store some metadata to metasrv

I think @dantengsky have some ideas on it.

Originally posted by @BohuTANG in #7211 (comment)

The text was updated successfully, but these errors were encountered:

BohuTANG · 2022-08-30T04:45:44Z

Note: This task should wait until #7211 is finished.

kesavkolla · 2022-11-19T00:44:48Z

It would be awesome to support generate/load indices from fuse and actual data is stored under remote storage. This will give very good power for most of the anaytics solutions. The data changes in analytics is rather slow as compared to OLTP system. So even if we compute the indices at a periodic interval that would be a tremendous improvement.

BohuTANG · 2022-11-19T00:49:57Z

That's absolutely right, thank you!

BohuTANG added the C-feature Category: feature label Aug 30, 2022

BohuTANG mentioned this issue Aug 30, 2022

Release proposal: Nightly v0.9 #7052

Closed

43 tasks

BohuTANG added the A-storage Area: databend storage label Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero-copy converting for a location with many parquet files to fuse engine table #7381

Zero-copy converting for a location with many parquet files to fuse engine table #7381

BohuTANG commented Aug 30, 2022

BohuTANG commented Aug 30, 2022

kesavkolla commented Nov 19, 2022

BohuTANG commented Nov 19, 2022 •

edited

Loading

Zero-copy converting for a location with many parquet files to fuse engine table #7381

Zero-copy converting for a location with many parquet files to fuse engine table #7381

Comments

BohuTANG commented Aug 30, 2022

BohuTANG commented Aug 30, 2022

kesavkolla commented Nov 19, 2022

BohuTANG commented Nov 19, 2022 • edited Loading

BohuTANG commented Nov 19, 2022 •

edited

Loading