Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image pull cache misbehavior: cache key calculated over uncompressed layer checksums (vs compressed) #2226

Closed
aecay opened this issue Jul 6, 2021 · 3 comments · Fixed by #2231

Comments

@aecay
Copy link

aecay commented Jul 6, 2021

Due to a change in a tool that mirrors docker images from public registries to a company-internal one, there are now two very similar versions of some images in our internal registry. Since that has become the case, there have been errors with buildkit (specifically a long-running buildkitd instance in our CI pipeline) building images that come FROM either of these versions. The symptomology of the bug is as follows:

  1. at first, a build FROM either of the images works
  2. after this, further builds FROM the same image work fine
  3. but builds FROM the other image fail with an error like this:
    2021-07-06 11:12:15.815  #4 [1/1] FROM registry/core-networking/release/quay.io/cil...
    2021-07-06 11:12:15.815  #4 resolve registry/core-networking/release/quay.io/cilium/hubble:v0.5.1-0.0.1@sha256:41ed0dd296bfcfd5c8f4ce53e36824a0c256fc5f9adce10584e37e1e2d1c6f97 0.0s done
    2021-07-06 11:12:15.815  #4 ERROR: failed to get record 8i7b094rsfhzf9z3urd2c5fmd by chainid: missing descriptor handlers for lazy blobs [sha256:038f0311438f4b4b2578916e434c8449f4ded30f9b516e325771219b80670cf1]
    
  4. running buildctl prune on the buildkitd instance resets whatever the problem is, and after that command the first image to be built again "wins" and locks out its doppelganger.

Looking at the results of the logs above, I think I can clearly see something going wrong:

manifests of both images
% crane digest registry/core-networking/release/quay.io/cilium/hubble:v0.5.1-0.0.1
sha256:41ed0dd296bfcfd5c8f4ce53e36824a0c256fc5f9adce10584e37e1e2d1c6f97
% crane manifest registry/core-networking/release/quay.io/cilium/hubble:v0.5.1-0.0.1
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 2365,
      "digest": "sha256:3b2b06a782b31dc7b21a65f6ae8ec2ca30f327cc8c599f0c594c637949a1381d"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 2813316,
         "digest": "sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1916657,
         "digest": "sha256:eeb760e03ef1a4bd98b03c87f409b44e2424ef9b4759c1c458130bc5dfc96f76"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 14461977,
         "digest": "sha256:2bd3f7b5f26019a8f6e2a044576eebf494cc13a50c7b1b59aa03ef455a008b3c"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1397030,
         "digest": "sha256:96e3156e019aee51c7dc5d21e336fff3a8fe77e1972a7955ff86e4acf3244c6a"
      }
   ]
}
% crane digest registry/core-networking/release/cilium/hubble:v0.5.1-0.1.0
sha256:24785434ffba35f8e3ec6bbbac6a00473cc6577cf9232879abfe750419463904
% crane manifest registry/core-networking/release/cilium/hubble:v0.5.1-0.1.0
{                         
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 2365,
      "digest": "sha256:3b2b06a782b31dc7b21a65f6ae8ec2ca30f327cc8c599f0c594c637949a1381d"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 2813316,
         "digest": "sha256:038f0311438f4b4b2578916e434c8449f4ded30f9b516e325771219b80670cf1"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1916657,
         "digest": "sha256:a0cd755b63fda14c297ed6f6a4f910e42af8c7eb0d056d6eeaeea653bb099a88"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 14461977,
         "digest": "sha256:608f0f0c3c2db7708268996bf1dabf1fcd5c47b6b011a6989dfe22421854077e"
      },
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 1397030,
         "digest": "sha256:f0038aa1121bc8221891c76fd98bdd6fb6c72f88f5519754e9bfdf640c608964"
      }
   ]
}%                                                           

The output above shows that the "locked out" image with sha digest 41ed0... does NOT contain the layer with digest sha256:038f0.... That layer is instead contained by the doppelganger image whose digest is 24785.... In other words, the error pasted above shows (I believe) buildkit doing something invalid: trying to access a layer that doesn't even belong to the image that the docker build is FROM. Somehow, though, the cache must be getting corrupted so that this happens.

I noticed that the images both have identical config. I believe what's happening in the code here is that a cache key for the image is being computed based on the diffids, which are hashes of the uncompressed tar files for the layers. However, it can happen that the image layers get recompressed, as has evidently happened to some of the images we archived to our registry a while ago and are now rearchiving again with the new tool. Recompression will change the sha digests of the layers themselves, as stored in the image manifest (and in the registry).

So when the first image is built, buildkit pulls the layers compressed in one way, but marks the cache as being populated with data equivalent to the uncompressed layers. When the image based on the second compression flavor is built, the cache doesn't contain the (compressed) data that its (uncompressed) keys say it should have, an invariant is violated, and the process errors out.

The simplest way to solve this would be (I think) to calculate the cache key based on the chained compressed layer digests, not (as now) the chained uncompressed digests. That leads to "unnecessary" cache misses when data has been downloaded under compression scheme X but is missing under Y -- but the bookkeeping to match up compressed and uncompressed layer checksums seems like it would be fairly baroque for marginal benefit.

I think I've finally grokked this, although it has been a long journey of discovery to get me here. I think at this point the resolution of this issue could use:

  1. confirmation from someone more familiar than I with the buildkit code base/docker ecosystem and file formats whether my analysis of the problem is correct
  2. indication from the maintainers of whether the proposed fix (to change the calculation of the cache key so that it's based on compressed layer digests) is acceptable/another fix is preferred
  3. Someone™ to code whatever fix is decided on

Thanks, and please do let me know if there's something further I can provide to aid diagnosis/understanding.

@tonistiigi
Copy link
Member

@sipsma

@sipsma
Copy link
Collaborator

sipsma commented Jul 7, 2021

@aecay Many thanks for the detailed and helpful report, your analysis of the problem seems spot on and I was able to reproduce the behavior with a unit test.

I've opened #2231 with a fix. While your suggestion would work, I think we actually want the cache map to be the same for these images because, in terms of cache, we really care about the contents of the final unpacked filesystem, not the format those filesystems were transferred to us in. Instead, I just changed the code to ignore the error you were getting; it was basically trying to optimize by re-using the unpacked snapshot of the other image, but couldn't do so because it turns out the other image is "lazy", which means it exists in the cache but the actual pull of it has been put off until later. Ignoring the error is fine there as it just skips that optimization which wasn't possible.

@aecay
Copy link
Author

aecay commented Jul 7, 2021

Brilliant, thanks for the explanation and the very quick fix 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants