-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the COPY from external location performance #4308
Comments
/assignme |
@BohuTANG Since I am not quite familiar with rust threads, have questions about this issue. I feel the source stream items will produced by pipeline executor, it seems already in other threads, right? The consumer is in current thread? |
I think if we transform the Copy Plan into |
It's ready to do it now. We can build pipeline through new processor, example Feel free to contact me if you need any help. |
@sundy-li Got it. |
@sundy-li Should we use a sink pipe and a source pipe to create a complete pipeline for refactoring? |
Yes! |
Run databend-query with disk:
|
Why is there
OK. I also find some tests in the codes which seems using minio without detail steps. And why is interpreter_copy unit test is missing? Is it difficult to mock or something else? |
|
Upload file to stage using curl -H "stage_name:my_internal_stage" -F "upload=@./books.csv" -XPUT http://localhost:8081/v1/upload_to_stage Got this error unexpected: (op: write, path: /Users/kaichen/Documents/projects/databend/target/debug/benddata/datas/stage/my_internal_stage, source: File exists (os error 17))% |
The log shows
|
-rw-r--r-- 1 kaichen staff 0 Apr 3 19:47 my_internal_stage
-rw-r--r-- 1 kaichen staff 0 Apr 3 20:12 test
-rw-r--r-- 1 kaichen staff 0 Apr 3 20:26 test1
-rw-r--r-- 1 kaichen staff 0 Apr 3 20:54 test2 |
|
Yes, it should be a bug since I already tested successfully using minio. Let me try to fix it. |
#4783 (comment) @zhang2014 @sundy-li @BohuTANG let me back here, I looked at the MemoryTable code and know we can use one pipe to implement it. However about if using S3StageTable, I am not quite sure. If I understand correct, this copy operation should do internal stage as well which is not using S3 storage. What is your suggestion? |
I finally got actually |
Summary
If we COPY a s3 file and insert into a table, the progresses are:
S1. Read s3 file by blocks from s3 location
S2. Write blocks stream to table t1
S3. Commit
https://github.com/datafuselabs/databend/blob/16e06e414c4680f0d640abada631af89369be877/query/src/interpreters/interpreter_copy.rs#L83-L102
S1 and S2 is in the same thread, looks we can make them in parallel.
The text was updated successfully, but these errors were encountered: