Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter: Enable to specify the compression type for all layers of the finally exported image #2057

Merged
merged 1 commit into from
Jul 7, 2021

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Apr 1, 2021

resolves: #1911

Currently, exporter's compression flag is applied only to newly created layers. So the exported image can be a mix of several compression types (uncompressed and gzip) even if the compression type is specified.

It would be great if we can specify the compression type of all layers of the finally exported image.

This commit introduces a new exporter flag force-compression enabling to specify the compression type of the exported layers. If layers don't match the type specified by compression option, they are forcefully converted to the targeting type.

In addition to making sure the final compression type, this flag, in the future, will also help users to quickly catch up and try with updates & trends on novel image layer formats (e.g. zstandard, estargz, zstd:chunked, etc).

How compression variant contents are tracked?

When a user specifies a compression type different from the blob's original (i.e. pulled) one, a conversion happens. Once the blob is converted, the result contents are cached to the local content store. *immutableRef tracks its compression variant contents using its lease and the following label to the original (pulled) content blob.

  • buildkit.io/compression/digest.<compression type> = <sha256 digest>

During converting compression type, this label is looked up first. If there is no targeting label or the content is lost in the content store, conversion happens.

Test of resolving #1911

BUILD="buildctl build --frontend=dockerfile.v0 --local context=/tmp/tmpimg --local dockerfile=/tmp/tmpimg "
$BUILD --output type=oci,dest=/tmp/uncompressed.tar,oci-mediatypes=true,compression=uncompressed
$BUILD --output type=oci,dest=/tmp/gzip.tar,oci-mediatypes=true,compression=gzip,force-compression=true

The above commands produce gzip.tar with gzip-compressed layers:

{
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:b105e3a8ccf94db2a87393ad4bdb1bd938750f1e8a72b54f4ff15573afdf7396",
    "size": 1267
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:9d48c3bd43c520dc2784e868a780e976b207cbf493eaff8c6596eb871cbd9609",
      "size": 2789669
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:49f20b6e614f787092825b2e2f22ad62be1c5e6fb9aec85276039383df91c6a7",
      "size": 103
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:3de9369605bb1c6f39b1559feb19f77f4aad8f5d0b4cca92a6ef827eb423a6fa",
      "size": 106
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:99cd2a06d8a234d71e8c5dccafd0bdf46497ed6b2d5603f538d7608428844cd6",
      "size": 106
    }
  ]
}

@AkihiroSuda @tonistiigi @fuweid

@ktock ktock changed the title Enable to specify the compression type for all layers of the finally exported image exporter: Enable to specify the compression type for all layers of the finally exported image Apr 1, 2021
README.md Outdated Show resolved Hide resolved
@ktock
Copy link
Collaborator Author

ktock commented Apr 1, 2021

PTAL

@ktock
Copy link
Collaborator Author

ktock commented Apr 8, 2021

@tonistiigi Thanks for taking look at this.
How is this going?

@ktock
Copy link
Collaborator Author

ktock commented Apr 21, 2021

Can we move this forward?

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this does tracking of the state of the created blobs. Afaics they are not tracked by the cache manager and I don't see any leases around writers. iiuc the tracked blob is still created with the old code, ignoring compression settings and then that blob may be converted in a second pass.

As mentioned in #1911 (comment) the hardest part about this should be how to associate multiple blobs with a single record.

}

// unlazy layers as converter uses layer contents in the content store
// TODO(ktock): un-lazy only layers whose type is different from the target, selectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a requirement. Otherwise this introduces a regression into exporting lazy refs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a sign that blob creation is not happening in the correct place as Remote already contains a reference to a blob(s).

@ktock ktock force-pushed the export-compression branch 2 times, most recently from 24c399c to ec8a782 Compare May 26, 2021 01:00
@ktock
Copy link
Collaborator Author

ktock commented May 26, 2021

@tonistiigi Thanks for the review. Fixed the design to track compression variant contents as the following.

When a user specifies a compression type different from the blob's original (i.e. pulled) one, a conversion happens. Once the blob is converted, the result contents are cached to the local content store. *immutableRef tracks its compression variant contents using its lease and the following label of the original (pulled) content blob.

  • buildkit.io/compression/digest.<compression type> = <sha256 digest>

During converting compression type, this label is looked up first. If there is no targeting label or the content is lost in the content store, conversion happens.

@ktock
Copy link
Collaborator Author

ktock commented Jun 1, 2021

Can we move this forward?

@tonistiigi tonistiigi added this to the v0.9.0 milestone Jun 11, 2021
@sipsma
Copy link
Collaborator

sipsma commented Jul 1, 2021

If there are multiple blobs for a single ref, do each of them count when calculating disk usage of a ref in the prune code? As far as I can see, because multiple blobs per ref are tracked only using leases, that means the cache's metadata still only stores one blob ID per ref (whichever one is created first I suppose). This in turn means that this code will calculate the disk usage using only one of those blobs:

buildkit/cache/refs.go

Lines 206 to 211 in edc28d1

if dgst := getBlob(cr.md); dgst != "" {
info, err := cr.cm.ContentStore.Info(ctx, digest.Digest(dgst))
if err == nil {
usage.Size += info.Size
}
}

mu.Lock()
mprovider.Add(layerP)
mu.Unlock()
if forceCompression {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this code that creates the blobs with the correct compression type conceptually makes more sense to be in the existing computeBlobChain code, somewhere around here maybe:

buildkit/cache/blobs.go

Lines 57 to 58 in edc28d1

if refInfo.Blob != "" {
return nil, nil

Not a strong opinion though, so if that's easier said than done I don't think it's a big deal to leave it here.

@ktock
Copy link
Collaborator Author

ktock commented Jul 2, 2021

@sipsma Thank you for the review.
Moved the blob creation logic into computeBlobChain and added a logic to accumulate the size of compression variants.

Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktock commented on one more corner case, otherwise LGTM

// accumulate size of compression variant blobs
if strings.HasPrefix(k, compressionVariantDigestLabelPrefix) {
if cdgst, err := digest.Parse(v); err == nil {
if info, err := cr.cm.ContentStore.Info(ctx, cdgst); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you import 2 images that are identical except they use different compression types for their layers and then tried to export each one to their converted compression format. You could then end up in a situation where the blob returned by getBlob on this ref also has a compression variant label attached to it, meaning the size would get double counted here, right?

If so, I guess you can just check that cdgst doesn't equal getBlob? Or something like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to avoid double count if a label points to the blob itself.

Signed-off-by: ktock <ktokunaga.mail@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exporter: allow compression type update for existing blobs
4 participants