Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload to OC server with encryption can lead to 0.1% corrupted files. #10975

Closed
VincentvgNn opened this issue Sep 10, 2014 · 35 comments
Closed
Labels
Milestone

Comments

@VincentvgNn
Copy link

Description

Using the OC client and uploading many files to the OC server that uses encryption can lead to encrypted files that are corrupted. When these files are downloaded to another sharing client, the file size and modification date are ok, but a CRC check against the original file shows that they are different. Opening these files is impossible due to the corruption.
Downloading to other clients or via the web interface makes no difference. The corrupted files are exactly equal.
A binary comparison shows that one or more sections of these files are corrupted.

example corrupted jpg-file
(left: original file, right: corrupted file, 2/3 is ok)

Current issue

Uploading of a 500 MB data set of 3000 files and 300 folders leads to about 0.1% corrupted files when encryption is used.

Steps to reproduce

  1. Use the OC server at a website host with many other users, such that one can be sure of a limited speed and connection interruptions.
  2. Let the admin share an empty r/w test folder with 2 or more OC clients (preferably at your own local network for easy comparison).
  3. Let all clients be online and copy a 500MB data set with 3000 files and 300 folders into the test folder at client 1
  4. Wait 6-8 hours until all data is copied to the other clients.
  5. Use a file comparison tool like Beyond Compare to do a CRC check between the original files and the files copied to the other clients.
  6. About 0-5 files will have corrupted sections due to the encryption. I repeated this test at least 5 times.

encrypted files corrupted - selection
(left: 2 original files, right 2 corrupted files)

Expected behavior

No data loss or corruption. Same data integrity performance as on a HDD.

Server configuration

OC server installed at a Dutch webhosting company. 4-5 GB storage.
Control via DirectAdmin and installation using Installatron.
Own (non-shared) IP, ownCloud using https and encryption.

Operating system: Linux Hosting Package
Web server: Apache
Database: MySQL
PHP version: native, 5.5.
ownCloud version: 7.0.2 (stable)
List of activated apps: default apps + encryption
External storage: no
Encryption: yes

Client configuration

Browser: Google Chrome (or Firefox or IE11)
Operating system: Win XP, Win 7 or Win 8
ownCloud version: 1.6.3

@PVince81
Copy link
Contributor

Thanks for the detailed report.
Please confirm that:

  1. The problem happens for shared folders (non-shared are fine ?)
  2. You are uploading with the sync client, not the web UI
  3. There are no interruptions like loss of connection during upload
  4. Clients are not uploading conflicting files into the shared folder
  5. Corrupted files appear on ALL clients when downloading, not just a few (meaning that the corruption happens on upload)

Is that correct ?

Do you also find upload corruption when uploading directly through the Web UI (might be difficult with that many files) or WebDAV directly, without the sync client ? I'm asking because the sync client will use chunks, splitting files in pieces, so if it only happens for the sync client case then it could be a bug with chunking.

CC @schiesbn for encryption stuff

@VincentvgNn
Copy link
Author

@PVince81
Answers:

  1. Comparing source data and received data in a non-shared situation would need a new test where I have at least 2 sync clients connected to one account.
    To rule out the sharing, I will test again with 3 sync clients on one account. One of them will do the upload.

  2. I use the Web UI only for downloading the corrupted samples.

  3. In the 5 hours needed for syncing there are many interruptions in the upload. It can be at my home network, my PC's can go into the sleep mode and it can be at my webhost,
    I get the most interruptions for free from my webhost where the OC server is.
    Normal file transfer can handle these interruptions thanks to the check-sum control.
    Is the OC encrypting software robust in this?
    Is there any validation method on the encrypted files?
    Or is the only way decrypting and check-sum comparison with the original file?.

4a) I did not observe conflicting files at the upload. The uploading clients "seems" to be doing just the upload. I excluded .lnk files and that works fine.

4b) At the upload client I caught 1 temporary file .B201TEX0.FVF.~4f64, that strange enough was set to read-only and therefore probably not removed. Nice for another issue ;-))

4c) On the activity monitor of the upload client I saw that after an interruption files were sometimes downloaded again. I thought that I was part of the re-syncing. But ......... ,
Today I took a closer look at the creation date of the source files and that reveals that quite some files were written back to the uploading client !!!
At 14:16 the process started by copying all files to the OC folder for uploading.
10 minutes later all files were copied to the OC folder while the uploading was running.
Summary of the creation time in the OC folder:
14:16 - 14:25 ---- 2434 files 490MB , no changed creation date (copying all files to the OC folder)
15:14 - 19:02 ---- 643 files 4.9MB , changed creation date, these files were downloaded back !!
Download at upload, another issue?

  1. Yes, corrupted files appeared on 2 receiving clients and are exactly equal to the same files downloaded via the Web UI.
    Just another bug: files downloaded via the web-interface are getting a new modification date.

  2. I'm not so excited about uploading via the Web UI.
    I will consider WebDAV. Maybe by using NetDrive 2.0. that I have to update anyway.

@PVince81
Copy link
Contributor

  1. Normally the upload and encryption are two complete separate things.
    What happens when the sync client uploads, it uploads unencrypted chunks of data (10 MB each) and saves them into a temporary cache folder ($datadir/$user/cache). For each chunk it checks that the size matches the expected size, which I believe is sent through an extra HTTP header by the client.
    If the sizes all match, it will read the chunks one by one from the temporary folder and write them into the target storage as a part file. When writing it uses the ownCloud API calls, which means that if encryption is enabled, then the bytes written will be encrypted at that time.
    So far I don't see how interrupted uploads could affect that process, unless corrupted bytes are sent along within the chunks which would be unrelated to encryption.

Do you have any bigger text files in your test set, which could make it possibly to find out what kind of corruption is occurring ? For example whether some data is shifted or missing. I assume the file sizes are still the same, but parts of the contents is different ?

There is no check sum comparison on the server side. The only check is based on the chunk size and total file size. If the chunk size does not match, the chunk is deleted and the sync client will resend it again.

Note that files smaller than 10 MB do not use chunks and go through a different code path. Such files are written directly into the storage (and encrypted at the same time). Were your corrupted files all < 10 MB ?

4a) Conflicting files would be when two clients are uploading the same file name at the same time, in which case the sync client might produce conflict files locally.

4c) When you say "downloaded again" do you mean the files uploaded by client1 "client1file.dat" was being downloaded by client2, but interrupted. Then client2 will redownload "client1file.data". That's the expected behavior, it will grab the whole file. Or is client1 redownloading files it had just uploaded itself ?

  1. When downloading a file with the web UI it's not technically possible to keep the mtime. The browser will simply save the file locally and set it to the current save time.

@VincentvgNn thanks for the detailed info. Unfortunately it's still not clear what could be causing the corruption, unless at some point the uploaded chunk gets corrupted on the wire but still has the same size when it lands on the server.

CC @ogoffart to check the previous comment about the redownload and resuming part.

CC @schiesbn for encryption

CC @icewind1991 @DeepDiver1975 for general questions about checksumming chunks

@VincentvgNn
Copy link
Author

@PVince81
Item 1)
I tested again using 3 sync clients that were connected to 1 admin account.
So no shared folder for sharing between 3 different user accounts as before.
While copying the test data to the local sync folder at PC1 the synchronization was already running.
Uploading at PC1 started and PC2 and PC3 started downloading to their synchronized folders.
After one test with the 500MB data I could not find corrupted files at PC2 and PC3.
It doesn't prove that it cannot happen, but the chance on getting corrupted files seems to be less.

New issue:
PC2 received all files without missing any, while PC3 finished syncing missing 11 files.
missing files at win7 pc - small pict

Item 3)
The largest file is 22MB and there are only 3 files above 10 MB.
The corrupted files were about 50-200kB. So there should be no problems with chunks.

Item 4a)
PC2 and PC3 should only be downloading.
In a previous experiment I have 1 time seen a conflict that resulted in a local conflict file (word conflict in the file name).

Item 4c)
I meant that client 1 was downloading files that it just uploaded itself about 10 min. before.
This happened a few times. After a while uploading there was an interrupt/disconnection (operation cancelled) immediately followed by downloading 1-30 files to itself, files that were uploaded 10 minutes before!
downloads after interrupt
(127 files were downloaded back to the source)

@PVince81
Copy link
Contributor

Thanks for all the details.

About PC 3: the missing 11 files, did it download them afterwards in the next run ? It can happen that the sync client is not aware of new files at the time of sync. Usually it will first scan the server and see what to download, then download that. If after the scan new files appear there, it will not get them until the next run.

Interesting, so corrupted files are not using the chunk algo.
Also the ones that are re-downloaded seem to be small files a well.

Can you tell us more about your server config ? (or link to it if you posted it somewhere else).
Are you using NFS for the data directory ?

I find it strange that files are re-downloaded, which normally only happens if the file changed on the server side, for example if its mtime changes.

This still doesn't explain the corruption.
If I understand well, the corrupted file has a different size than the normal file ? (the diff screenshot seems to imply so)

However there's already code to verify the file size after upload and reject the upload if it doesn't match. So something must happen after that, when writing the part file.

@PVince81
Copy link
Contributor

Do you by any chance have any corrupted text file ?
Would be good to inspect the corruption itself.
Encryption works with blocks of 6k. I suspect that one block might be corrupted there.
In a text file it would probably produce binary data in the middle of the file.

@VincentvgNn
Copy link
Author

I quickly analysed the corrupted files by using Ultra Edit hex compare.
The corrupted files are corrupted from 0x0000 until 0x4000 (16.384 bytes = 16 kB).
Are the blocks 6k or 16k?
It seems that only the first block have been affected.
Later-on I will have a closer look at it.

@PVince81
Copy link
Contributor

@VincentvgNn very strange. Discussed this with @schiesbn and he confirmed that the unencrypted block size is 6k (6126 bytes).
Encrypted blocks are 8k (8192) so I not likely to be a filesystem failure, because a broken 8k encrypted block would produce a 6k corrupted block.

PHP mostly works with 8k bytes, so it looks like in your case the files are getting corrupted before they reach the encryption code.

I guess you haven't tried yet without encryption ? I understand that you might not want to do this if you don't want to trust your provider.

Can you confirm again that it's not always the same file getting corrupt between test runs ?

Most likely something is happening during the interruptions you were experiencing.

@dragotin @ogoffart are you aware of any corruption that could happen because of interruptions ?
(sync client 1.6.3 + OC 7.0.2)

I wonder whether we should eventually add a checksum column in the oc_filecache, it should be possible to compute this on the fly. @icewind1991 @karlitschek @DeepDiver1975

@VincentvgNn
Copy link
Author

@PVince81
I analysed 7 of these corrupted files from 4 test runs. Never the same file was corrupted.
All are corrupted from 0x0000 until 0x4000 (16.384 bytes = 16 kB).
This first 16 kB section is each time equal to the 2nd, 3rd, or 4th section of the original file.
0x4000 until 0x8000 copied to 1st section: 3 sample files
0x8000 until 0xc000 copied to 1st section: 2 sample files
0xc000 until 0x10000 copied to 1st section: 1 sample file
So the wrong piece of the original file was put in the 1st 16 kB section of the result file.
The encryption and decryption of the 16k sections is ok, but where, just before the actual encryption, could these non-encrypted 16 kB sections have been copied from a wrong location to the first 16 kB section?

Can it go wrong by paired 8 kB blocks or can PHP work with 16 kB?

Until now I have seen this kind of corruption only with encryption and with file sharing between different users.

@PVince81
Copy link
Contributor

This is indeed very strange.
Especially having data from later copied to the first position.
Usually the file is processed from the beginning to the end, not the reverse, so this is quite mysterious.

One idea would be that the files are written at the same time by separate PHP processes, but even if it did it probably wouldn't produce such results.

Do you have the "files_locking" app enabled ? That one should prevent files to be opened by more than one process at the same time.

@schiesbn @bantu @icewind1991 any clues about these strange results, see previous comment

@VincentvgNn
Copy link
Author

@PVince81
My ownCloud server has been automatically installed by Installatron.
Where should I find that "files_locking" app?
In the OC server "apps" folder I can find folders like "files_encryption", "files_sharing", "files_versions", etc., but no "files_locking" folder.
Should I be able to enable it via the OC Admin Apps menu?
Am I missing a piece of software that should have been in the automatic distribution?

@PVince81
Copy link
Contributor

@VincentvgNn do you have the "files_texteditor" app and "Pictures" app ? Just wondering because these are bundled with every official package. The "files_locking" app should be bundled as well.
Maybe Installatron hasn't bundled everything. I hope it didn't break or patch any code.
We had reports in the past where providers or tools that setup ownCloud automatically had forgotten to include/replace some source files resulting in broken installs (broken as in "cannot run it at all"). I wouldn't be surprised if it would happen there as well.

Do you have access to the files there ?
If yes, you could try to setup ownCloud from scratch yourself based on the official tarballs.

@VincentvgNn
Copy link
Author

@PVince81
On the OC Admin Apps menu I have 16 active items (Activity, Calendar, Contacts, Deleted files, Documents, Encryption, First Run Wizard, Full Text Search, Mail Template Editor, PDF Viewer, Pictures, Share Files, Text Editor, Updater, Versions, Video Viewer) and 8 inactive items (Bookmarks, External Sites, External storage support, External user support, LDAP user and group backend, ownCloud dependencies info, WebDAV user backend, Turn Off App Codechecker (3rd P)).
In the the OC server "apps" folder there are also 24 folders with more or less similar names.

I have the same installation at 2 different hosting providers controlled by Installatron.
Installatron uses what it receives from ownCloud via the auto updates.
Can you check whether the "files_locking" app has accidentally been omitted?

I just downloaded owncloud-7.0.2.zip from the OC website. There are 24 folders in the "apps" folder, but also no "files_locking" app.

@PVince81
Copy link
Contributor

@VincentvgNn sorry for the confusion. It seems that "files_locking" isn't shipped.
The app is here: https://github.com/owncloud/apps/tree/master/files_locking
If you want to try it, you could pick the version from the "v7.0.2" tag.

@VincentvgNn
Copy link
Author

@PVince81
Good news?

I copied the "files_locking" app from the downloaded zip-file into the appropriate OC server directory.
First I ran the original test with the app disabled. This time it even took 9 i.o. 5 hours to complete it and I caught 1 corrupted file.

Next I ran the test with the app enabled:

  1. Especially in the beginning (first 1000 files, 200 MB) the process seemed to run smoother and faster. The interrupts were now proceeded by an "Unable to write" message. The whole test still took 5 hours.
  2. Downloading of just uploaded files to it-selves still happens. This can be after an interrupt, but also in the middle between the uploads. This happened now to 680 of the 3000 files. That's not solved.
    Maybe even worse, while the previous time 127 files were downloaded back to the source.
  3. 3 files in the OC source folder were not only downloaded back to itselves (new creation date), but they also got a new modification date.
  4. After this test I did not find any corrupted file where the first 16 kB got the wrong data. One or zero of this type of corrupted files is not a big difference, but the "files_locking" app may have done it's job!
  5. New is that I got 2 corrupted files on the downloading PC's, one on each PC, files that are missing a piece of data at the end. These files got corrupted at the download. In the cloud they are OK.
    Example:
    to short corrupted file win8pc - for forum

Why is the "files_locking" app a separate app and not an essential part of the server software?

@PVince81
Copy link
Contributor

The files_locking app is new and was probably not deemed ready yet, but it is being worked on and will hopefully be shipped with new releases, especially knowing that it can prevent many race conditions to cause unpredictable behavior.

If that app fixes the corruption for you, it might show that somewhere during upload a race condition of some sorts might happen and cause pieces of the file to be overwritten as you showed. It is still unclear how this kind of corruption could happen.

@VincentvgNn
Copy link
Author

@PVince81
Is it for sure that the "files_locking" app cannot accidentally lock files for ever?
Will a locked file at each access be checked on the validity of that lock? Or is there maybe a timed lock?
There is not yet a description added to the "files_locking" app.

Using a check-sum control and repair procedure would further improve the data integrity.
Starting with passing the check-sum?
It could have prevented the 2 corrupted files that I got in the last test.

@PVince81
Copy link
Contributor

I'm not familiar enough with the files_locking app to be able to answer.
These are questions I am also wondering about.

The files_locking app, from what I saw, is creating files inside of data/$user/.locks. I guess there might be situations where the locks are not cleared, for example PHP timeouts.

Checksums would be nice as well...

@dragotin @DeepDiver1975 @karlitschek @icewind1991 for the checksum case: did we discuss this before ? I remember having discussed "using checksums as etags" which would be an issue due to special cases, but "using checksums for verifying uploaded chunks/files", does that make sense ?

@PVince81
Copy link
Contributor

@VincentvgNn I raised owncloud-archive/apps#1937 to look into the lock question.

@PVince81
Copy link
Contributor

@karlitschek will we bundle files_locking with OC 7.0.3 or future versions ?

@PVince81
Copy link
Contributor

@VincentvgNn by the way, it's been a while you've been using the files_locking app, right ? Did you meet any of the file corruption issues again ?

@VincentvgNn
Copy link
Author

@PVince81
I didn't do further testing on the "files_locking" app while owncloud-archive/apps#1937 is also still open.
The "files_locking" app seemed to prevent the 1 to 3 corrupted 16k pieces at the beginning of a file.
As soon as I have OC server 7.0.3 and OC client 1.6.4 installed, I will again run a fresh test using the
"files_locking" app. Maybe in the next week.

As it is now, I still experience too many incidents with a few types of corrupted or missing files.
Issue owncloud/client#2247 is one of them.
I love the concept of ownCloud, but these problems make that I cannot yet fully rely on OC. It might take at least a half year before everything is solved.
Therefore I started again using the free version of Tresorit for the real reliable file sharing and end to end encryption. The ownCloud system will run in parallel for testing until it is reliable enough.
I will leave Google Drive because it is not encrypted and because it also shows problems with not retaining the last modification time and with some other conflicts.

@PVince81
Copy link
Contributor

Raised another ticket for adding checksums: #11811

@ckamm
Copy link

ckamm commented Nov 5, 2014

The issues owncloud/client#1969 and owncloud/client#2425 have people with the same symptoms. In the latter, the issue still happens even with the files_locking app enabled and without encryption.

@shikasta-net
Copy link

I believe I am experiencing the same issue since updating to version 7 (it possibly happened in 6 but was less noticeable). I too am running Owncloud behind an Apache proxy over HTTPS.

I only work with small files (<10MB) and am often finding that the start of one or more files is missed when more than one file is modified in a short space of time (there may be another factor but I can't identify it). Unfortunately I have been unable to reproduce this on a test file; those that are affected contain sensitive data (thankfully not lost) so I cannot submit an example yet.

On filing this post I only had two examples of original and damaged file to hand: one is ~900kB with the first 16KB missing, the other is ~300kB with the first 32kB missing. Is there any useful test I can run on the original files to provide comparisons of the data and help track this problem?

@enoch85 enoch85 added this to the 8.2-next milestone Mar 24, 2015
@guruz
Copy link
Contributor

guruz commented Apr 8, 2015

@shikasta-net Hi! Can you tell us if you were using encryption or not?
Can you also tell us which Apache version it was on that 3rd of December?

@shikasta-net
Copy link

I appear to have been using 2.4.7, installed 5th Oct last year when I upgraded to Ubuntu 14.04.

The Apache proxy is configured to redirect all traffic through HTTPS, so it was definitely encrypted but I don't remember what
settings I used and I've since overhauled the config to take advantage of some changes between Apache 2.2 and 2.4.

Kym

On 08/04/2015 14:40, Markus Goetz wrote:

@shikasta-net https://github.com/shikasta-net Hi! Can you tell us if you were using encryption or not?
Can you also tell us which Apache version it was on that 3rd of December?


Reply to this email directly or view it on GitHub #10975 (comment).

@guruz
Copy link
Contributor

guruz commented Apr 8, 2015

@shikasta-net I don't mean HTTPS/SSL encryption, I mean the encryption app in ownCloud which encrypts your files. Did you enable+use that?

@shikasta-net
Copy link

No I disabled most apps (including encryption) when I first installed the server.

Kym

On 08/04/2015 15:23, Markus Goetz wrote:

@shikasta-net https://github.com/shikasta-net I don't mean HTTPS/SSL encryption, I mean the |encryption| app in ownCloud
which encrypts your files. Did you enable+use that?


Reply to this email directly or view it on GitHub #10975 (comment).

@VincentvgNn
Copy link
Author

On 22-09-2015 I wrote:
New is that I got 2 corrupted files on the downloading PC's, one on each PC, files that are missing a piece of data at the end. These files got corrupted at the download. In the cloud they are OK.

In a recent test of server 8.0.3 and client 1.8.1, I again found one file that was corrupted at download on one PC. Unfortunately the beginning of the client log was truncated until just after the incident.
The server log shows that the file was downloaded a second time some minutes after it was created at that client for the first time. So for some reason the file was downloaded a second time without writing it correctly to the client's storage folder.
I'l have to wait for a next incident and not forget to have the file logging switched on.

@PVince81
Copy link
Contributor

@VincentvgNn are you still seeing the corrupted files with the new sync client 2.0.1 and ownCloud 8.1.3 with the new encryption ?

@PVince81
Copy link
Contributor

same question for @shikasta-net 😄

@ghost
Copy link

ghost commented Sep 30, 2015

moving to 8.2.1 unless it is re-prioritized. We will triage it again

@ghost ghost modified the milestones: 8.2.1-next-maintenance, 8.2-current Sep 30, 2015
@shikasta-net
Copy link

@PVince81, I haven't seen file corruption in a while (I'm on server 8.1.1) but also haven't used it quite so extensively since it happened. I shall try some more heavy testing and report back.

@ghost
Copy link

ghost commented Oct 25, 2015

closing issue until we get some additional input

@ghost ghost closed this as completed Oct 25, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Aug 8, 2019
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants