-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure multipart upload issue #708
Comments
Could you test with both the azureblob and azureblob-sdk providers? The latter is newer and will receive most of my development effort going forward. This is not yet in a release so you will have to compile from master. |
@gaul Luckily every commit is also pushed as docker image, so it's easy to test! :) However, with that I have the following error:
So I should rather try to make |
@jerbob92 please test the latest master. |
uploadWithResponse requires an InputStream that supports marks which the socket does not support and wrapping in a BufferedInputStream would require extra memory. Fixes #708.
Sigh, this fix is wrong. I am improving the testing for azureblob-sdk and didn't test MPU end-to-end. |
One workaround is to wrap the |
The SDK lacks a method to upload a part with a non-repeatable payload. References #708.
Not sure if I understand correctly, does this mean that:
It's not really clear to me why Azure SDK would require it to be seekable, but since multipart uploads are meant for smaller blocks, I would say wrapping it in a |
@gaul I have found the bug. The error from Azure is:
When looking at the code of jclouds, the
It uses Edit: it looks like this was fixed already? apache/jclouds@6ef293d |
For the original azureblob issue, please edit S3Proxy's For the azureblob-sdk issue, I haven't looked at the SDK source code but please test your suggested |
When I try that I get the following error:
When I use It shouldn't be too hard to roll a 2.6.1 release that just contains this fix right? First change master to be Then create a new branch for 2.6 maintenance, reset it to the commit that the tag 2.6.0 refers to and cherry pick the base64 fix into it: git checkout -b 2.6.x
git reset --hard 173e3a4a49d910ad46d77a508a2ba7b67abf31fa
git cherry-pick 6ef293dfd34f2af0ef45bacd04247c3e8afe0261 Then change the version on the 2.6.x branch to Regarding the part size limit, I had the default part size of 16MB in mind when thinking of the workaround, when you use a way larger part size then it can become problematic indeed. Ideally the |
The Apache release process is more involved and I would prefer to spend the ~10 hours doing more useful things if possible. That project is dying which was the motivation to write Ideally Microsoft will respond to my SDK issue. Otherwise I will send them a PR. There are alternatives like using the existing sub-part chunking logic for earlier Azure APIs that might be a good workaround if you want to investigate them. Let's leave this issue open until the path forward is more clear. |
Ah ok, that sounds like a pain then if creating a release takes that much time.
What do you mean with this? I'll fork and build my own docker image for now. |
S3Proxy currently has logic to map S3 > 4 MB parts into a sequence of Azure <= 4 MB parts:
I believe that this should limit memory usage of your I am past my limit on discussing hacks to make things work in the short-term. I only have enthusiasm to work on the correct long-term solution so please self-support and report back if you have something useful. |
I'm not planning on using a workaround for I'm just trying to get multipart working for larger files right now without making major changes or using something untested. Since the bug is already fixed in jclouds I have made my own docker image that has the latest version of s3proxy + the jclouds snapshot, since it might be useful for someone else, here it is: https://github.com/jerbob92/s3proxy/pkgs/container/s3proxy%2Fcontainer |
When using the minio client (and thus also when using the Go SDK), I have some files that just can't be uploaded when using the default settings. The only way I can upload them is by changing the
--part-size
, The default is 16MiB, but when I change it to 20 it works. I have disabled multipart completely for now to prevent any issues.As far as I know there is nothing special to these files, most files like these upload just fine. It's a PDF file and it's
286697813
bytes. I usemc put
from the local filesystem to copy them into Azure via s3proxy.The error code that I get back from s3proxy is:
The error that is in the s3proxy logs is:
All the blocks that it send are
16800342
bytes, except for the last one, which is1487296
bytes. All other blocks are uploaded fine, except for the last one, which results in the error above.Any idea what's going on here? I'm going to see if I can debug this some more tomorrow.
Other files that are failing are of sizes:
291995023, 286683989, 286904511, 287128205, 304781589, 293607881
I'm running the version from this commit: 356c12f
The text was updated successfully, but these errors were encountered: