You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Load data in background: Users query as normal but copy data to databend cloud at the same time. Once load are ready, users can query in a more efficient way.
There is no COPY here, we can transform the parquet files to fuse engine files directly, for example:
Users can create a table:
CREATE table xx ... location='s3://<user-bucket-path>' CONNECTION=...
If the location is parquet files and not created by fuse engine, we can query them in normal way:
list all the parquet files
query them without any optimization (Since it does not have fuse indexes)
If the user does some optimization like:
optimize table xx; -- this statement syntax is a demo
We can:
create min/max and other all fuse indexes for the parquet files without loading them
convert all parquet files as the fuse engine files, and store some metadata to metasrv
It would be awesome to support generate/load indices from fuse and actual data is stored under remote storage. This will give very good power for most of the anaytics solutions. The data changes in analytics is rather slow as compared to OLTP system. So even if we compute the indices at a periodic interval that would be a tremendous improvement.
There is no
COPY
here, we can transform the parquet files to fuse engine files directly, for example:Users can create a table:
If the location is parquet files and not created by fuse engine, we can query them in normal way:
If the user does some optimization like:
We can:
I think @dantengsky have some ideas on it.
Originally posted by @BohuTANG in #7211 (comment)
The text was updated successfully, but these errors were encountered: