-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[epic] [design] Custom transfer mode for multipart uploads #11
Comments
Problem: abstracting vendor-specific multipart APIs that require state and capturing response dataThere are complex, hard to abstract requirements from cloud vendors for multipart uploads - for example S3 needs the client is expected to retain the Needless to say the Azure BlockBlob chunked upload API is quite different. Q: Does this mean there can't be a generic multi-part upload transfer adapter that can handle multiple vendors? A: It means that most of the Q: Should we still allow custom A: Q: Should we still support allowing custom HTTP A: This may be a YAGNI thing; On the other hand, I find this to be lacking from the Wait, so how do we solve the S3
|
Note that if we remove the custom Same goes for the For this reason, I think it may make sense to just allow specifying a custom body as in the example above, which means we can keep clients dumb. |
Both S3 and Azure support content integrity verification using the A couple of suggestions: MD5 specific"actions": {
"parts": [
{
"href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
"header": {
"Authorization": "Bearer someauthorizationtokenwillbesethere"
},
"pos": 7500001,
"set_content_md5": true
}
]
} Setting the More generic approach to content digestThis is inspired by RFC-3230 and RFC-5843 which define a more flexible approach to content digest in HTTP. The following approach specifies what digest header(s) to send with the content in an RFC-3230 like manner: "actions": {
"parts": [
{
"href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
"header": {
"Authorization": "Bearer someauthorizationtokenwillbesethere"
},
"pos": 7500001,
"want_digest": "contentMD5"
}
]
} The RFC-3230 defines Other possible values include a comma-separated list of q-factor flagged algorithms, one of "actions": {
"parts": [
{
"href": "https://foo.cloud.com/storage/upload/20492a4d0d84?part=3",
"header": {
"Authorization": "Bearer someauthorizationtokenwillbesethere"
},
"pos": 7500001,
"want_digest": "sha-256;q=1.0, md5;q=0.5"
}
]
} Will cause the client to set a request: HTTP/1.1 PUT /storage/upload/20492a4d0d84?part=3
Authorization: Bearer someauthorizationtokenwillbesethere
Digest: SHA-256=thvDyvhfIqlvFe+A9MYgxAfm1q5=,MD5=qweqweqweqweqweqweqwe=
... NOTE: Azure allows for a crc32 based check, but also supports content-md5 so I am not sure crc32 has any benefit over md5. |
Point to consider: add a How vendors handle uncomitted partial uploads:
Of course, there is no guarantee that the client will be able to successfully call the |
Supporting cleanup of unfinished uploads in GCP could be implemented by tagging objects as "draft" when we |
Q: does it make sense to have some kind of "common to all parts" attribute for the operation that specifies headers and other attributes that may be common to all These could be:
Pros: More compact and clean messages, less repetition I am leaning against it, but may be convinced otherwise ;) |
Undecided:
Tasks:
|
Discussion has been summarized into a spec doc here: https://github.com/datopian/giftless/blob/feature/11-multipart-protocol/multipart-spec.md |
This is a container ticket to discuss the design of a custom transfer adapter supporting multipart upload. This is not a part of the official
git-lfs
spec, but will be extremely valuable to us and if it works, could be used by custom git-lfs clients, and eventually could be proposed as an addition to the LFS protocol.Goal
Spec a transfer protocol that will allow uploading files in parts to a storage backend, focusing on cloud storage services such as S3 and Azure Blobs.
Design goals:
Must:
basic
transfer APINice / Should:
Initial Protocol design
multipart-basic
{"operation": "download"}
requests work exactly likebasic
download request with no change{"operation": "upload"}
requests will break the upload into severalactions
:init
(optional), a request to initialize the uploadparts
(optional), zero or more part upload requestscommit
(optional), a request to finalize the uploadverify
(optional), a request to verify the file is in storage, similar tobasic
upload verify actionsbasic
transfers, if the file fully exists and is committed to storage, noactions
will be provided and the upload can simply be skippedbasic
requests except that{"transfers": ["multipart-basic", "basic"]}
is the expected transfers value.basic
protocolRequest Objects
The
init
,commit
and each one of theparts
actions contain a "request spec". These are similar tobasic
transfer adapteractions
but in addition tohref
andheader
also includemethod
(optional) andbody
(optional) attributes, to indicate the HTTP request method and body. This allows the protocol to be vendor agnostic, especially as the format ofinit
andcommit
requests tends to vary greatly between storage backends.The default values for these fields depends on the action:
init
defaults to no body andPOST
methodcommit
defaults to no body andPOST
methodparts
requests default toPUT
method and should include the file part as body, just like withbasic
transfer adapters.In addition, each
parts
request will include thepos
attribute to indicate the position in bytes within the file in which the part should begin, andsize
attribute to indicate the part size in bytes. Ifpos
is omitted, default to0
. Ifsize
is omitted, default to read until the end of file.Examples
Sample Upload Request
The following is a ~10mb file upload request:
Sample Upload Response:
The following is a response for the same request, given an imagined storage backend:
As you can see, the
init
action is omitted as will be the case with many backend implementations (we assume initialization, if needed, will most likely be done by the LFS server at the time of the batch request).Chunk sizes
It is up to the LFS server to decide the size of each file chunk.
TBD: Should we allow clients to request a chunk size? Is there reason for that?
The text was updated successfully, but these errors were encountered: