Skip to content

Using Checksums in Direct Uploads

Janko Marohnić edited this page Oct 7, 2021 · 12 revisions

Using Checksums to Verify Integrity of Direct Uploads (with Shrine & Uppy)

When doing direct uploads to your app or a cloud service such as AWS S3, it's good practice to have the upload endpoint verify the integrity of the upload by using a checksum. You can do this by calculating a base64-encoded MD5 hash of the file on the client side before the upload, and include it in the Content-MD5 request header (AWS S3, Google Cloud Storage, and Shrine's upload_endpoint support this).

You can calculate the base64-encoded MD5 hash of the file using the spark-md5 and chunked-file-reader JavaScript librarires. You can pull them from [unpkg]:

<html>
  <head>
    <script src="https://unpkg.com/spark-md5/spark-md5.js"></script>
    <script src="https://unpkg.com/chunked-file-reader/chunked-file-reader.js"></script>
    <!-- ... -->
  </head>

  <body>
    ...
  </body>
 </html>

Now you can create an fileMD5() function that calculates a base64-encoded MD5 hash of a File object and returns it as a Promise:

function fileMD5 (file) {
  return new Promise(function (resolve, reject) {
    var spark  = new SparkMD5.ArrayBuffer(),
        reader = new ChunkedFileReader();
    reader.subscribe('chunk', function (e) {
      spark.append(e.chunk);
    });
    reader.subscribe('end', function (e) {
      var rawHash    = spark.end(true);
      var base64Hash = btoa(rawHash);
      resolve(base64Hash);
    });
    reader.readChunks(file);
  })
}

Now, how you're going to include that MD5 checksum depends on whether you're uploading directly to the cloud service (with Shrine's presign_endpoint plugin), or to your app using the upload_endpoint plugin.

AWS S3, Google Cloud Storage etc.

When fetching upload parameters from the presign endpoint, Shrine storage's #presign function needs to know that you'll be adding the Content-MD5 request header to the upload request. For both AWS S3 and Google Cloud Shrine storage this is done by passing the :content_md5 presign option:

Shrine.plugin :presign_endpoint, presign_options: -> (request) do
  {
    content_md5: request.params["checksum"],
    method: :put # only for AWS S3 storage
  }
end

The above setup allows you to pass the MD5 hash via the checksum query parameter in the request to the presign endpoint. With Uppy it could look like this:

Uppy.Core({
    // ...
  })
  .use(Uppy.AwsS3, {
    getUploadParameters: function (file) {
      return fileMD5(file.data)
        .then(function (hash) { return fetch('/presign?filename='+ file.name + '&checksum=' + hash) })
        .then(function (response) { return response.json() })
    }
  })
  // ...
  .run()

Upload endpoint

When uploading the file directly to your app using the upload_endpoint Shrine plugin, you can also use checksums, as the upload endpoint automatically detects the Content-MD5 header. With Uppy it could look like this:

fileMD5(file).then(function (hash) { 
  Uppy.Core({
      // ...
    })
    .use(Uppy.XHRUpload, {
      endpoint: '/upload', // Shrine's upload endpoint
      fieldName: 'file',
      headers: {
        'Content-MD5': hash,
        'X-CSRF-Token': document.querySelector('meta[name=_csrf]').content,
      }
    })
    // ...
    .run()
})