-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you elaborate on out of bound file transfer? #45
Comments
Instead of initiating a file up or download I think it makes sense to allow passing in a reference, a chunk of a file or a list of small files. Similar to the Opaque property you introduced in other messages as well. |
This requires some discussion indeed. Let me explain the basic idea first: we could just stay with data upload and download via gRPC for simplicity to start with but with a future outlook in mind, we know that gRPC is not excellent for transferring large files as (streamed) repeated messages (essentially the max reasonable limit would be the size of the gRPC message which IIRC is 3M). Hence, for all data intensive transfer workflows the idea is to redirect to an HTTP(s) endpoint. As a result of this call you'd get a URL with some constrained validity (e.g. you need to start the transfer within next N seconds). I see that there could be an opportunity to inline small files directly into the payload of the gRPC indeed for optimization. The question is how small is small. Looks like it should not be controversial to say that it would make sense for payloads in KB range. For things in MB range we already enter the realm of the higher-level chunking (as we know it in owncloud). Chunking itself is interesting: I think with current usage by the sync client we could say, that it mainly serves to provide resumable upload, right? It looks to me that no standard HTTP resumable upload exists (https://stackoverflow.com/questions/20969331/standard-method-for-http-partial-upload-resume-upload#20978266) ? Another usage is parallel upload: is this really used and makes a difference? If yes, then I would be in favour to provide a different gRPC call to cater for that use-case (possibly the same API for both parallel and resumable). The bulk operations (bundling many independent files) would definitely deserve a different approach because these operations have complex return status (some but not all files may fail to upload for various reasons). I think this requires careful consideration but perhaps may be considered a second-order optimization and a different (set of) calls? |
You had me worried there for a second. What about listing large directories (100k files), but the default grpc message size of 4 MB does not affect streams. It only affects individual chunks. Someone actually tested this (although on a loopback device): https://ops.tips/blog/sending-files-via-grpc/ His finding is that 1k seems to be a good chunk size. But plain HTTP2 seems to be twice as fast, if with a little more variance in latency. Another, but older (2016) related post is https://andrewjesaitis.com/2016/08/25/streaming-comparison/ Thinking ... |
Ok, best for now would be to at least keep the grpc streaming based file transfer or provide an example how you would do it. While grpc streaming might be sub optimal it makes implementing the api a lot easier IMO. And I think we can iterate on it after we got sharing and search in the protocol. I would prefer a more feature complete protocol, rather then trying prematurely optimize performance. |
@labkode I was under the impression that clients should be able to talk directly to the services using the cs3 apis? auth aside, shouldn't we then have some api call to stream bytes between services? |
examples helped me, thx |
This reverts commit 2b3033d.
This reverts commit 2b3033d.
I am struggling to rebase our changes on top of the review branch. In the review branch you are planning to move the file up and download out of the cs3 APIs. Can you elaborate on how you plan to do the actual file transfer?
We will need to send the file stream from the ocdavsvc service to the actual storage provider. Do you want to open another htt2 connection for that? or use the existing one to multipex binary chunks over it?
AFAIR we will always have the ocdavsvc or another gateway component in front of the actual storage provider ... so what is your vision on this?
The text was updated successfully, but these errors were encountered: