Skip to content
This repository has been archived by the owner on Feb 23, 2019. It is now read-only.

Make the Google Drive CDN work properly #523

Merged
merged 2 commits into from
Aug 27, 2017

Conversation

jikamens
Copy link
Contributor

@jikamens jikamens commented Aug 22, 2017

The Google Drive CDN wasn't working properly for a number of different
reasons. This commit contains many fixes to make it work. In
particular:

  • The URL format that was being generated for files in Google Drive
    (https://random-stuff.googledrive.com/host/random-stuff/file-path)
    no longer works. The code in this commit uses the URL format
    https://drive.google.com/uc?id=_file-id_, which unfortunately
    requires a redirect each time the file is fetched, but a fetch with
    a redirect is better than the CDN not working at all.

  • The code for uploading files to Google Drive was assuming that files
    in the same chunk with the same title were the same file. This was
    causing all sorts of problems, such as attempting to delete files
    multiple times, attempting to access files that had been deleted,
    uploading files that had already been uploaded, comparing local
    files with the wrong uploaded files to determine whether they needed
    to be uploaded again, etc.

    I've fixed this by adding properties to the uploaded files
    indicating their full path, and using the properties to disambiguate
    files with the same title. We also use these properties to allow us
    to locate uploaded files and repopulate the ID catch (described
    below) after it is purged.

  • Because everything before "file-path" in the old-style URLs was
    the same for all URLs, it was inexpensive and fast to generate
    Google Drive CDN URLs. Now, however, every URL has to have the
    Google Drive file ID in it. We can't afford to query Google Drive to
    determine all of the file IDs every time we're rendering a page.
    Therefore, I've added a database table to cache the mapping from
    remote file path to file ID. I've also added a command to the
    Performance menu in the title bar to purge the contents of this
    table in case it somehow gets corrupted.

  • I've changed the code that returns errors to the user when an upload
    to Google Drive fails, so that instead of just reporting the
    "reason" from the Google Drive error, which isn't sufficiently
    detailed, it sends back all of the errors that Google Drive sent,
    JSON-encoded so they're relatively easy to read and figure out
    what's going on.

  • CSS files with relative URLs in them were getting minified and their
    URLs were getting turned absolute, but the scheme and domain name
    were not being prepended to the beginning of these URLs. As a
    result, once these URLs were minified and uploaded to the CDN, these
    URLs could no longer be loaded, since the origin was Google Drive
    rather than the blog host where these URLs actually reside. I've
    fixed this by adding a get_domains function to the CDN which
    causes URLs to be fully qualified in minified CSS files.

    Note: Unfortunately, these URLs are not being written to come from
    the CDN, even if the relevant files are present there. I spent a
    good long time trying to figure out how to make this work and
    couldn't; there doesn't seem to be a concept in the code of doing
    URL replacement inside CSS files. This is no worse than it was
    before, but it's something it would be nice if we figured out how to
    make better.

side note: see #523

The Google Drive CDN wasn't working properly for a number of different
reasons. This commit contains many fixes to make it work. In
particular:

* The URL format that was being generated for files in Google Drive
  (https://_random-stuff_.googledrive.com/host/_random-stuff_/_file-path_)
  no longer works. The code in this commit uses the URL format
  https://drive.google.com/uc?id=_file-id_, which unfortunately
  requires a redirect each time the file is fetched, but a fetch with
  a redirect is better than the CDN not working at all.

* The code for uploading files to Google Drive was assuming that files
  in the same chunk with the same title were the same file. This was
  causing all sorts of problems, such as attempting to delete files
  multiple times, attempting to access files that had been deleted,
  uploading files that had already been uploaded, comparing local
  files with the wrong uploaded files to determine whether they needed
  to be uploaded again, etc.

  I've fixed this by adding properties to the uploaded files
  indicating their full path, and using the properties to disambiguate
  files with the same title. We also use these properties to allow us
  to locate uploaded files and repopulate the ID catch (described
  below) after it is purged.

* Because everything before "_file-path_" in the old-style URLs was
  the same for all URLs, it was inexpensive and fast to generate
  Google Drive CDN URLs. Now, however, every URL has to have the
  Google Drive file ID in it. We can't afford to query Google Drive to
  determine all of the file IDs every time we're rendering a page.
  Therefore, I've added a database table to cache the mapping from
  remote file path to file ID. I've also added a command to the
  Performance menu in the title bar to purge the contents of this
  table in case it somehow gets corrupted.

* I've changed the code that returns errors to the user when an upload
  to Google Drive fails, so that instead of just reporting the
  "reason" from the Google Drive error, which isn't sufficiently
  detailed, it sends back all of the errors that Google Drive sent,
  JSON-encoded so they're relatively easy to read and figure out
  what's going on.

* CSS files with relative URLs in them were getting minified and their
  URLs were getting turned absolute, but the scheme and domain name
  were not being prepended to the beginning of these URLs. As a
  result, once these URLs were minified and uploaded to the CDN, these
  URLs could no longer be loaded, since the origin was Google Drive
  rather than the blog host where these URLs actually reside. I've
  fixed this by adding a `get_domains` function to the CDN which
  causes URLs to be fully qualified in minified CSS files.

  Note: Unfortunately, these URLs are _not_ being written to come from
  the CDN, even if the relevant files are present there. I spent a
  good long time trying to figure out how to make this work and
  couldn't; there doesn't seem to be a concept in the code of doing
  URL replacement inside CSS files. This is no worse than it was
  before, but it's something it would be nice if we figured out how to
  make better.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants