-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upload to OC server with encryption can lead to 0.1% corrupted files. #10975
Comments
Thanks for the detailed report.
Is that correct ? Do you also find upload corruption when uploading directly through the Web UI (might be difficult with that many files) or WebDAV directly, without the sync client ? I'm asking because the sync client will use chunks, splitting files in pieces, so if it only happens for the sync client case then it could be a bug with chunking. CC @schiesbn for encryption stuff |
@PVince81
4a) I did not observe conflicting files at the upload. The uploading clients "seems" to be doing just the upload. I excluded .lnk files and that works fine. 4b) At the upload client I caught 1 temporary file 4c) On the activity monitor of the upload client I saw that after an interruption files were sometimes downloaded again. I thought that I was part of the re-syncing. But ......... ,
|
Do you have any bigger text files in your test set, which could make it possibly to find out what kind of corruption is occurring ? For example whether some data is shifted or missing. I assume the file sizes are still the same, but parts of the contents is different ? There is no check sum comparison on the server side. The only check is based on the chunk size and total file size. If the chunk size does not match, the chunk is deleted and the sync client will resend it again. Note that files smaller than 10 MB do not use chunks and go through a different code path. Such files are written directly into the storage (and encrypted at the same time). Were your corrupted files all < 10 MB ? 4a) Conflicting files would be when two clients are uploading the same file name at the same time, in which case the sync client might produce conflict files locally. 4c) When you say "downloaded again" do you mean the files uploaded by client1 "client1file.dat" was being downloaded by client2, but interrupted. Then client2 will redownload "client1file.data". That's the expected behavior, it will grab the whole file. Or is client1 redownloading files it had just uploaded itself ?
@VincentvgNn thanks for the detailed info. Unfortunately it's still not clear what could be causing the corruption, unless at some point the uploaded chunk gets corrupted on the wire but still has the same size when it lands on the server. CC @ogoffart to check the previous comment about the redownload and resuming part. CC @schiesbn for encryption CC @icewind1991 @DeepDiver1975 for general questions about checksumming chunks |
@PVince81 New issue: Item 3) Item 4a) Item 4c) |
Thanks for all the details. About PC 3: the missing 11 files, did it download them afterwards in the next run ? It can happen that the sync client is not aware of new files at the time of sync. Usually it will first scan the server and see what to download, then download that. If after the scan new files appear there, it will not get them until the next run. Interesting, so corrupted files are not using the chunk algo. Can you tell us more about your server config ? (or link to it if you posted it somewhere else). I find it strange that files are re-downloaded, which normally only happens if the file changed on the server side, for example if its mtime changes. This still doesn't explain the corruption. However there's already code to verify the file size after upload and reject the upload if it doesn't match. So something must happen after that, when writing the part file. |
Do you by any chance have any corrupted text file ? |
I quickly analysed the corrupted files by using Ultra Edit hex compare. |
@VincentvgNn very strange. Discussed this with @schiesbn and he confirmed that the unencrypted block size is 6k (6126 bytes). PHP mostly works with 8k bytes, so it looks like in your case the files are getting corrupted before they reach the encryption code. I guess you haven't tried yet without encryption ? I understand that you might not want to do this if you don't want to trust your provider. Can you confirm again that it's not always the same file getting corrupt between test runs ? Most likely something is happening during the interruptions you were experiencing. @dragotin @ogoffart are you aware of any corruption that could happen because of interruptions ? I wonder whether we should eventually add a checksum column in the |
@PVince81 Can it go wrong by paired 8 kB blocks or can PHP work with 16 kB? Until now I have seen this kind of corruption only with encryption and with file sharing between different users. |
This is indeed very strange. One idea would be that the files are written at the same time by separate PHP processes, but even if it did it probably wouldn't produce such results. Do you have the "files_locking" app enabled ? That one should prevent files to be opened by more than one process at the same time. @schiesbn @bantu @icewind1991 any clues about these strange results, see previous comment |
@PVince81 |
@VincentvgNn do you have the "files_texteditor" app and "Pictures" app ? Just wondering because these are bundled with every official package. The "files_locking" app should be bundled as well. Do you have access to the files there ? |
@PVince81 I have the same installation at 2 different hosting providers controlled by Installatron. I just downloaded owncloud-7.0.2.zip from the OC website. There are 24 folders in the "apps" folder, but also no "files_locking" app. |
@VincentvgNn sorry for the confusion. It seems that "files_locking" isn't shipped. |
@PVince81 I copied the "files_locking" app from the downloaded zip-file into the appropriate OC server directory. Next I ran the test with the app enabled:
Why is the "files_locking" app a separate app and not an essential part of the server software? |
The If that app fixes the corruption for you, it might show that somewhere during upload a race condition of some sorts might happen and cause pieces of the file to be overwritten as you showed. It is still unclear how this kind of corruption could happen. |
@PVince81 Using a check-sum control and repair procedure would further improve the data integrity. |
I'm not familiar enough with the files_locking app to be able to answer. The files_locking app, from what I saw, is creating files inside of Checksums would be nice as well... @dragotin @DeepDiver1975 @karlitschek @icewind1991 for the checksum case: did we discuss this before ? I remember having discussed "using checksums as etags" which would be an issue due to special cases, but "using checksums for verifying uploaded chunks/files", does that make sense ? |
@VincentvgNn I raised owncloud-archive/apps#1937 to look into the lock question. |
@karlitschek will we bundle files_locking with OC 7.0.3 or future versions ? |
@VincentvgNn by the way, it's been a while you've been using the files_locking app, right ? Did you meet any of the file corruption issues again ? |
@PVince81 As it is now, I still experience too many incidents with a few types of corrupted or missing files. |
Raised another ticket for adding checksums: #11811 |
The issues owncloud/client#1969 and owncloud/client#2425 have people with the same symptoms. In the latter, the issue still happens even with the files_locking app enabled and without encryption. |
I believe I am experiencing the same issue since updating to version 7 (it possibly happened in 6 but was less noticeable). I too am running Owncloud behind an Apache proxy over HTTPS. I only work with small files (<10MB) and am often finding that the start of one or more files is missed when more than one file is modified in a short space of time (there may be another factor but I can't identify it). Unfortunately I have been unable to reproduce this on a test file; those that are affected contain sensitive data (thankfully not lost) so I cannot submit an example yet. On filing this post I only had two examples of original and damaged file to hand: one is ~900kB with the first 16KB missing, the other is ~300kB with the first 32kB missing. Is there any useful test I can run on the original files to provide comparisons of the data and help track this problem? |
@shikasta-net Hi! Can you tell us if you were using encryption or not? |
I appear to have been using 2.4.7, installed 5th Oct last year when I upgraded to Ubuntu 14.04. The Apache proxy is configured to redirect all traffic through HTTPS, so it was definitely encrypted but I don't remember what Kym On 08/04/2015 14:40, Markus Goetz wrote:
|
@shikasta-net I don't mean HTTPS/SSL encryption, I mean the |
No I disabled most apps (including encryption) when I first installed the server. Kym On 08/04/2015 15:23, Markus Goetz wrote:
|
On 22-09-2015 I wrote: In a recent test of server 8.0.3 and client 1.8.1, I again found one file that was corrupted at download on one PC. Unfortunately the beginning of the client log was truncated until just after the incident. |
@VincentvgNn are you still seeing the corrupted files with the new sync client 2.0.1 and ownCloud 8.1.3 with the new encryption ? |
same question for @shikasta-net 😄 |
moving to 8.2.1 unless it is re-prioritized. We will triage it again |
@PVince81, I haven't seen file corruption in a while (I'm on server 8.1.1) but also haven't used it quite so extensively since it happened. I shall try some more heavy testing and report back. |
closing issue until we get some additional input |
Description
Using the OC client and uploading many files to the OC server that uses encryption can lead to encrypted files that are corrupted. When these files are downloaded to another sharing client, the file size and modification date are ok, but a CRC check against the original file shows that they are different. Opening these files is impossible due to the corruption.
Downloading to other clients or via the web interface makes no difference. The corrupted files are exactly equal.
A binary comparison shows that one or more sections of these files are corrupted.
(left: original file, right: corrupted file, 2/3 is ok)
Current issue
Uploading of a 500 MB data set of 3000 files and 300 folders leads to about 0.1% corrupted files when encryption is used.
Steps to reproduce
(left: 2 original files, right 2 corrupted files)
Expected behavior
No data loss or corruption. Same data integrity performance as on a HDD.
Server configuration
OC server installed at a Dutch webhosting company. 4-5 GB storage.
Control via DirectAdmin and installation using Installatron.
Own (non-shared) IP, ownCloud using https and encryption.
Operating system: Linux Hosting Package
Web server: Apache
Database: MySQL
PHP version: native, 5.5.
ownCloud version: 7.0.2 (stable)
List of activated apps: default apps + encryption
External storage: no
Encryption: yes
Client configuration
Browser: Google Chrome (or Firefox or IE11)
Operating system: Win XP, Win 7 or Win 8
ownCloud version: 1.6.3
The text was updated successfully, but these errors were encountered: