Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store client's file checksum as metadata and return it again in downloads and PROPFIND #18716

Closed
karlitschek opened this issue Aug 31, 2015 · 22 comments
Assignees
Labels
enhancement overview p2-high Escalation, on top of current planning, release blocker
Milestone

Comments

@karlitschek
Copy link
Contributor

We want to handle checksums in webdav requests.

Goals:

  • secure the webdav/http transport layer
  • partly secure again corruption on the storage layer

Limitations:

  • files are uploaded and created from several different sides. for example directly on the external storage. Because of that checksum are not always available or there generation would be too expensive. Because of that there are sometimes available and sometimes not. So checksums will be optional

Approach:

  • Everytime a clients who supports this feature. (the ownCloud Desktop Client in the first step) can send a checksum of the file in parallel with the payload.
  • The ownCloud server stores this checksum in the file cache table
  • If a checksum for a file is available in the filecache table then this value is returned with every propfind.
  • The client can then compare the checksum and detect possible corruption of the file.
  • In this first step of the checksumming no checksums are generated on the server side. The server acts purely as a storage at the moment. In the later stage this could be extended.

Because checksums are not always available and are completely optional the clients can't rely on them for the actual syncing algorithm.

This approach can be extended in different directions in the future but we should start with this step.


QA ticket: owncloud/QA#113
Documentation ticket: https://github.com/owncloud/documentation/issues/2132

@karlitschek
Copy link
Contributor Author

@karlitschek
Copy link
Contributor Author

@MTRichards

@MTRichards MTRichards added this to the 9.0-next milestone Aug 31, 2015
@aspdye
Copy link

aspdye commented Aug 31, 2015

👍

@PVince81
Copy link
Contributor

Sounds great 😄

@ckamm
Copy link

ckamm commented Oct 1, 2015

The client-side ticket is owncloud/client#3735. When this is added to the server, please also add a list of supported checksumming algorithms to the capabilities API. The client currently supports "Adler32", "MD5" and "SHA1".

You can already ask the client to send checksums for uploads by adding transmissionChecksum=MD5 to the [General] section of owncloud.cfg.

@guruz guruz changed the title checksum support Store client's file checksum as metadata and return it again Oct 2, 2015
@guruz guruz changed the title Store client's file checksum as metadata and return it again Store client's file checksum as metadata and return it again in downloads and PROPFIND Oct 2, 2015
@rullzer rullzer self-assigned this Oct 14, 2015
@dragotin
Copy link
Contributor

I would suggest that both client and server work with a HTTP header of the form:

OC-Checksum: <checksum-type>:<sum>

Currently we support the checksum types MD5, SHA1 and Adler32

@PVince81
Copy link
Contributor

PVince81 commented Dec 7, 2015

@dragotin is the client already sending something ?

TODO for the server:

  • enhance the FilesPlugin to return the stored checksum in a new "oc:checksum" attribute (for PROPFIND)
  • make sure the attribute CANNOT be set using PROPPATCH, but only through a legitimate upload
  • at upload time, if "OC-Checksum" header is set, store it
  • where to store it ? new filecache table column ? CC @icewind1991

@dragotin
Copy link
Contributor

dragotin commented Dec 7, 2015

yes, since 2.1.0 the client sends either Adler32 (probably not useful), MD5 or SHA1 if you enable it in the config file by setting:

transmissionChecksum=MD5

in the [General] section.

@icewind1991
Copy link
Contributor

where to store it ? new filecache table column

I would say so yes

@dragotin
Copy link
Contributor

dragotin commented Dec 7, 2015

I'd suggest to be able to store different types of checksums for one file. We will have different checksum types for securing the transmission versus checksums over the content.

@PVince81
Copy link
Contributor

PVince81 commented Dec 7, 2015

Regarding filecache table: I heard that adding a column on an existing table that has 10K or more entries might be slow at update time. So we might also need to consider a separate table.
But I'm not database expert. 😄

@dragotin
Copy link
Contributor

dragotin commented Dec 7, 2015

I'd suggest a detail table with colums fileId, checksumtypeId and checksum, with primary key (fileId, checksumtype) and another detail table mapping checksumtypeId to a meaningful string.

@icewind1991
Copy link
Contributor

I'd suggest a detail table with colums fileId, checksumtypeId and checksum, with primary key (fileId, checksumtype) and another detail table mapping checksumtypeId to a meaningful string.

Makes more sense yes

@DeepDiver1975
Copy link
Member

Regarding filecache table: I heard that adding a column on an existing table that has 10K or more entries might be slow at update time.

should be of lesser issue if the column is nullable

@PVince81 PVince81 added the p2-high Escalation, on top of current planning, release blocker label Dec 11, 2015
@rullzer rullzer mentioned this issue Jan 28, 2016
@rullzer
Copy link
Contributor

rullzer commented Feb 3, 2016

#21997 is in.

Documentation ticket in https://github.com/owncloud/documentation/issues/2132
And ticket for QA intergration tests in owncloud/QA#113

@rullzer
Copy link
Contributor

rullzer commented Feb 8, 2016

Closing time :)

@rullzer rullzer closed this as completed Feb 8, 2016
@MorrisJobke
Copy link
Contributor

Proper return values for multiple checksums went in as #22199

@dragotin
Copy link
Contributor

Just for the record:

So far, the client never asks for the checksum in PROPFINDs. As a result, the checksum is never returned in a PROPFIND result and thus does not need to be read from database for the normal PROPFIND usecase.

For the current functionality, the client does not need the content checksum at all. So far we agreed to only use the checksum for transportation verification. For that, the checksum is contained in a HTTP header of the PUT or GET request, and from there the client is reading it.

For change detection of server files, we continue to rely on the ETags which makes sense. So I currently see no reason why the client would EVER ask for the checksum of the whole file in a PROPFIND. It is very good to have that for special cases but not for the normal case.

Let me kindly ask for re-considering to store multiple checksums in a detail table rather than in one column in oc_filecache given these facts.

@bugsyb
Copy link

bugsyb commented Feb 18, 2016

Even if it is outside of this feature, it would be good to have
implemented checksum verification by client as an additional option on
top of ETags (or in parallel).

Dreaming further, it would be good to be able to sync differences within
files only (using rolling checksum window, similar to DropBox and rsync).

Regards,
David

On 2016-02-18 11:26, Klaas Freitag wrote:

Just for the record:

So far, the client never asks for the checksum in PROPFINDs. As a
result, the checksum is never returned in a PROPFIND result and thus
does not need to be read from database for the normal PROPFIND usecase.

For the current functionality, the client does not need the content
checksum at all. So far we agreed to only use the checksum for
transportation verification. For that, the checksum is contained in a
HTTP header of the PUT or GET request, and from there the client is
reading it.

For change detection of server files, we continue to rely on the ETags
which makes sense. So I currently see no reason why the client would
EVER ask for the checksum of the whole file in a PROPFIND. It is very
good to have that for special cases but not for the normal case.

Let me kindly ask for re-considering to store multiple checksums in a
detail table rather than in one column in oc_filecache given these facts.


Reply to this email directly or view it on GitHub
#18716 (comment).

@DeepDiver1975
Copy link
Member

After further discussion with @dragotin we will for now keep the current approach because it is simply enough for what we are trying to accomplish for 9.0

Even if storage backends can provide multiple checksums - the question is if we really need to store them in out database.

Assuming we once want to verify the checksums between owncloud and the storage system - I see no need in storing them because the checksums are only evaluated on upload to the storage.

The checksum send from/to the client is for a different scenarios/part of the workflow.

@dragotin
Copy link
Contributor

I think this is the important bit to fix #11811

@lock
Copy link

lock bot commented Aug 5, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement overview p2-high Escalation, on top of current planning, release blocker
Projects
None yet
Development

No branches or pull requests