-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable content checksums #4375
Comments
👍 I still vote for using a safe cryptograpgical checksum so we can later use it for more advanced magic. @LukasReschke @ogoffart What do you think? |
@moscicki your input here? |
Maybe also from @PVince81 and @DeepDiver1975 about storage backends (on the server) natively supporting certain checksums so we can align with them. |
Discussion on IRC regarding checksum type:
|
Other jobs that need to be done to expand our use of checksums:
|
Please consider that some storage backends such as ours already have a preferred checksum (adler32 in our case). It may be impossible for us to change the checksum type. More precisely, we have a technical ability to change the checksum type (for new files) but we may be constrained to do so by other considerations/policies. And also, consider that currently all our files are already checksummed with adler32 so any checksum type change could only apply to new files (so we would have a mix of checksum types in the system). The adler32 checksum is not efficient for change detection for small files. Hence using it as content checksum may be problematic for files smaller than a given size (say 1KB). Also, consider this especially if you think in the future of storing checksum blocks which are by definition rather small. So if we want to avoid to compute additional content checksum then the content checksum mechanism with adler32 should be only enabled for files above the critical size (meaning that below that size you should count with a possibility of collision and false-positive). So that would option one but it is not optimal because it would not work for very small files (which I guess defeats the purpose for *.eml files). So I think at least for us the distinction between content and transmission checksum makes sense. What is currently called a transmission checksum is actually enforced by the server and client cannot change it (another example: MD5 for Amazon S3). What is called content checksum is actually what client wants to use locally to achieve better operation with local change detection. In some cases both can be aligned, in other cases they cannot. If we wanted to add additional content checksum than I would suggest:
I am open to discuss it further, these are just a bunch of initial thoughts but the initial constraints I described are valid. |
A wishful note: Using two differently calculated checksums against same file often increase reliability. |
Hm, I got pointed to this: https://tools.ietf.org/html/rfc3230 |
Created owncloud/core#22711 about this. |
Pull request #4532 |
Hello, In 1.7.2 we have the content checksums implemented and enabled. Will this pull request affect the current transmission checksum implementation? Thank you, kuba |
@moscicki This has no effects on transmission checksums, it's only about checksums that the client stores in its database. |
-> #4638 |
Due to #3235 and #2983 we use content checksums for detecting unnecessary uploads and renames-that-aren't. But content checksums are currently only computed for .eml files.
We should consider enabling content checksumming for everything.
The text was updated successfully, but these errors were encountered: