Add repositories files to image tarballs #526

smukherj1 · 2019-09-13T21:22:14Z

Create a repositories file for image tarballs generated by the v1.tarball package. Image tarballs generated by docker save includes this file and our internal repo requires this because this file is part of the v1 & v2 schema.

For the image l.gcr.io/google/bazel@sha256:97bfeed0303cae14af7e8f66aad6c13f00b2b33081c59d0f4258717b8b94efec, the repositories files looks like:

{"l.gcr.io/google/bazel":{"latest":"626b494fdfba1950ebdf1ad5cc2e799879ec78ab8bb1bae11de6c9491fbab6cf"}}

This basically appears to be a map from the image name to the tag to the digest of the top most layer. This is currently blocking bazelbuild/rules_docker#580

The text was updated successfully, but these errors were encountered:

jonjohnsonjr · 2019-09-13T22:38:47Z

~~This might actually help a lot with #517~~

Nevermind, this just has the image digest, not the layers.

jonjohnsonjr · 2019-09-13T22:40:02Z

Some more context: containers/skopeo#425

smukherj1 · 2019-09-13T22:53:09Z

~~This might actually help a lot with #517~~

Nevermind, this just has the image digest, not the layers.

Minor correction. I believe it's the digest of the top most layer. So essentially a layer digest but I don't think it will help much with the optimization PR. Maybe the pusher can avoid calculating the digest of the top most layer if the repositories file is present but that's it.

jonjohnsonjr · 2019-09-13T22:57:18Z

Minor correction. I believe it's the digest of the top most layer.

That seems odd, but v1 images were basically linked lists, where each layer could be a complete image and referenced its parent image, so I can believe it. I'd be surprised if that's the case, but I'd like to see how this works when docker saveing various types of images:

manifest list
schema 2
schema 1

smukherj1 · 2019-09-13T23:02:45Z

The python implementation to extract this value is here. If you follow it, you'll see it's used to extract the value returned by the top function on docker_image here whose documentation suggests it's the layer id which I'm assuming is the digest.

The above impl is specifically for v1 images so it's possible the value is different for v2 & manifest lists.

jonjohnsonjr · 2019-09-15T05:17:46Z

Just took a look at this and it's not clear to me how to proceed. That digest value isn't present in the schema 2 manifest or config ☹️

In order to replicate docker's (or containeregistry's) behavior, we'd need to convert schema 2 to schema 1, then schema 1 to v1, then take the top layer's value. Converting from schema 2 to schema 1 is nontrivial, but I've done it before. I don't have any idea how to convert from schema 1 to v1 (that was deprecated way before I started caring about container trivia).

How is this value actually used? Why do we need it? We could use the digest of the top layer from the schema 2 manifest, but I suspect that wouldn't work out.

smukherj1 · 2019-09-15T18:16:58Z

Just took a look at this and it's not clear to me how to proceed. That digest value isn't present in the schema 2 manifest or config ☹️

In order to replicate docker's (or containeregistry's) behavior, we'd need to convert schema 2 to schema 1, then schema 1 to v1, then take the top layer's value. Converting from schema 2 to schema 1 is nontrivial, but I've done it before. I don't have any idea how to convert from schema 1 to v1 (that was deprecated way before I started caring about container trivia).

How is this value actually used? Why do we need it? We could use the digest of the top layer from the schema 2 manifest, but I suspect that wouldn't work out.

Yeah you're right. It's not as simple as just putting the digest of the top most layer in the repositories file. I took a look at the python containerregistry code and I believe it's the digest of the top most layer where the digest is the v1 layer digest. The _GenerateV1LayerId function here is generating this v1 layer digest. It seems to be a chained digest where the v1 digest of the current layer is sha256sum(curLayerV2_2Digest + " " + prevLayerV1Digest). For the top most layer it seems to be sha256sum(curLayerV2_2Digest + " " + prevLayerV1Digest + " " + rawConfig).

So I'm guessing it should be possible to generate this digest without going through the v2_2 -> v2 -> v1 conversion.

As for how it's currently used, I found a bunch of cases in our internal codebase that load the docker image built by rules_docker as a v1 tarball. I have replaced a few but there might be a bunch more. The one that's going to be tricky to fix is the python containerregistry tests that test the v1 -> v2 compatibility layer. So I was hoping to just generate this file if it's simple enough instead of locating all tests & negotiating fixes with their owners.

jonjohnsonjr · 2019-09-16T05:21:09Z

So do we need to be able to read and generate v1 tarballs? There are two tarball formats here, and we generate/read the more "modern" one in ggcr. I'm not sure how much we have to do to unblock bazelbuild/rules_docker#580, and I'm not sure what the easy path forward is (some changes here, some changes in the python implementation, change callers, etc.)

Summarizing the differences:

crane

Let's take a look at what crane produces:

$ crane save busybox crane.tar && mkdir crane && tar xf crane.tar -C crane

$ tree crane
crane/
├── 7c9d20b9b6cda1c58bc4f9d6c401386786f584437abbe87e58910f8a9a15386b.tar.gz
├── manifest.json
└── sha256:19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d

0 directories, 3 files

There's a manifest.json file, that points to the config, where we pulled the image from, and the layers. Note that these values point to files within the tarball, not necessarily their digests.

$ cat crane/manifest.json | jq .
[
  {
    "Config": "sha256:19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d",
    "RepoTags": [
      "index.docker.io/library/busybox:latest"
    ],
    "Layers": [
      "7c9d20b9b6cda1c58bc4f9d6c401386786f584437abbe87e58910f8a9a15386b.tar.gz"
    ]
  }
]

The config file is the normal config, from the registry:

$ cat crane/sha256\:19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d
{"architecture":"amd64","config":{"Hostname":"","Domainname":"","User":"","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"Cmd":["sh"],"ArgsEscaped":true,"Image":"sha256:758a17a836a4c09586a291c928d1f0561320e252d07c4749e14338daefe84b18","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":null,"Labels":null},"container":"e30cd53834b3dfdb989c63cc73f4f31f404c7a6a0c0e9d6b9e3e8451edd72596","container_config":{"Hostname":"e30cd53834b3","Domainname":"","User":"","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"Cmd":["/bin/sh","-c","#(nop) ","CMD [\"sh\"]"],"ArgsEscaped":true,"Image":"sha256:758a17a836a4c09586a291c928d1f0561320e252d07c4749e14338daefe84b18","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":null,"Labels":{}},"created":"2019-09-04T19:20:16.230463098Z","docker_version":"18.06.1-ce","history":[{"created":"2019-09-04T19:20:16.080265634Z","created_by":"/bin/sh -c #(nop) ADD file:9151f4d22f19f41b7a289e87aa9cfba3956ffd27746cb3b171b9bd2cb7e6c313 in / "},{"created":"2019-09-04T19:20:16.230463098Z","created_by":"/bin/sh -c #(nop)  CMD [\"sh\"]","empty_layer":true}],"os":"linux","rootfs":{"type":"layers","diff_ids":["sha256:6c0ea40aef9d2795f922f4e8642f0cd9ffb9404e6f3214693a1fd45489f38b44"]}}

We save the layer in its gzipped form:

$ cat crane/7c9d20b9b6cda1c58bc4f9d6c401386786f584437abbe87e58910f8a9a15386b.tar.gz | gunzip - | sha256sum
6c0ea40aef9d2795f922f4e8642f0cd9ffb9404e6f3214693a1fd45489f38b44  -

docker

There's a lot more stuff here:

$ docker save busybox > docker.tar && mkdir docker && tar xf docker.tar -C docker

$ tree docker
docker
├── 19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d.json
├── 65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── manifest.json
└── repositories

1 directory, 6 files

There is a similar manifest.json file:

$ cat docker/manifest.json | jq .
[
  {
    "Config": "19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d.json",
    "RepoTags": [
      "busybox:latest"
    ],
    "Layers": [
      "65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542/layer.tar"
    ]
  }
]

The config file has the same contents, just a different name.

The "Layers" points to a layer.tar in a 65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542 directory.

If we look at repositories, we see that "busybox:latest' points to that directory:

$ cat docker/repositories  | jq .
{
  "busybox": {
    "latest": "65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542"
  }
}

The contents of layer.tar are actually the same as the uncompressed 7c9d20b9b6cda1c58bc4f9d6c401386786f584437abbe87e58910f8a9a15386b.tar.gz layer from the crane tarball:

$ cat docker/65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542/layer.tar | sha256sum 
6c0ea40aef9d2795f922f4e8642f0cd9ffb9404e6f3214693a1fd45489f38b44  -

That json file is basically the config but with the id embedded:

$ cat docker/65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542/json | jq .
{
  "id": "65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542",
  "created": "2019-09-04T19:20:16.230463098Z",
  "container": "e30cd53834b3dfdb989c63cc73f4f31f404c7a6a0c0e9d6b9e3e8451edd72596",
  "container_config": {
    "Hostname": "e30cd53834b3",
    "Domainname": "",
    "User": "",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "/bin/sh",
      "-c",
      "#(nop) ",
      "CMD [\"sh\"]"
    ],
    "ArgsEscaped": true,
    "Image": "sha256:758a17a836a4c09586a291c928d1f0561320e252d07c4749e14338daefe84b18",
    "Volumes": null,
    "WorkingDir": "",
    "Entrypoint": null,
    "OnBuild": null,
    "Labels": {}
  },
  "docker_version": "18.06.1-ce",
  "config": {
    "Hostname": "",
    "Domainname": "",
    "User": "",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "sh"
    ],
    "ArgsEscaped": true,
    "Image": "sha256:758a17a836a4c09586a291c928d1f0561320e252d07c4749e14338daefe84b18",
    "Volumes": null,
    "WorkingDir": "",
    "Entrypoint": null,
    "OnBuild": null,
    "Labels": null
  },
  "architecture": "amd64",
  "os": "linux"
}

jonjohnsonjr added good first issue Good for newcomers help wanted Extra attention is needed labels Sep 13, 2019

jonjohnsonjr mentioned this issue Sep 16, 2019

Add ability to cache/push compressed layers #517

Closed

This was referenced Sep 18, 2019

Implement ability to generate v1 image tarballs #535

Closed

Implement v1 image tarball generation in new legacy package #536

Merged

jonjohnsonjr closed this as completed in #536 Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add repositories files to image tarballs #526

Add repositories files to image tarballs #526

smukherj1 commented Sep 13, 2019

jonjohnsonjr commented Sep 13, 2019 •

edited

Loading

jonjohnsonjr commented Sep 13, 2019

smukherj1 commented Sep 13, 2019

jonjohnsonjr commented Sep 13, 2019

smukherj1 commented Sep 13, 2019

jonjohnsonjr commented Sep 15, 2019

smukherj1 commented Sep 15, 2019

jonjohnsonjr commented Sep 16, 2019

Add repositories files to image tarballs #526

Add repositories files to image tarballs #526

Comments

smukherj1 commented Sep 13, 2019

jonjohnsonjr commented Sep 13, 2019 • edited Loading

jonjohnsonjr commented Sep 13, 2019

smukherj1 commented Sep 13, 2019

jonjohnsonjr commented Sep 13, 2019

smukherj1 commented Sep 13, 2019

jonjohnsonjr commented Sep 15, 2019

smukherj1 commented Sep 15, 2019

jonjohnsonjr commented Sep 16, 2019

crane

docker

jonjohnsonjr commented Sep 13, 2019 •

edited

Loading