Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement crane estargz #878

Closed
mattmoor opened this issue Dec 18, 2020 · 11 comments · Fixed by #879
Closed

implement crane estargz #878

mattmoor opened this issue Dec 18, 2020 · 11 comments · Fixed by #879
Assignees
Labels
good first issue Good for newcomers

Comments

@mattmoor
Copy link
Collaborator

mattmoor commented Dec 18, 2020

The intent of this is to be like crane cp, but to convert the image to the estargz format.

For bonus points, this could take a list of files to prioritize 🤔

cc @imjasonh @jonjohnsonjr for thoughts.

@mattmoor mattmoor added the good first issue Good for newcomers label Dec 18, 2020
@imjasonh
Copy link
Collaborator

How's it different than GGCR_EXPERIMENT_ESTARGZ=1 crane cp foo bar? I thought the goal was to make eStargz optimizing transparent?

@mattmoor
Copy link
Collaborator Author

I wouldn't ever expect crane cp to change the digest, and I would expect it to be efficient about not fetching layers when it can mount them, so I think a fundamentally different command would be useful.

@imjasonh
Copy link
Collaborator

crane cp foo bar --optimize? Just throwing out ideas.

I don't know that people will necessarily care about the specific eStargz implementation, and there even may be other versions/options in the future (like stargz and eStargz)

@mattmoor
Copy link
Collaborator Author

Maybe crane {convert,optimize} --type=estargz foo bar?

multiple modes makes sense for sure.

@imjasonh
Copy link
Collaborator

Maybe crane {convert,optimize} --type=estargz foo bar?

Yeah I like that.

Is there anything else {e}stargz related we might be able to add? Validating the TOCs match their digests, and actual layer contents? Efficiently fetching a single file with crane estargz {something} {path} ?

@mattmoor
Copy link
Collaborator Author

🤩 An optimized crane export w/ file list would be 🔥

Probably worth a second issue?

@mattmoor mattmoor self-assigned this Dec 18, 2020
@mattmoor
Copy link
Collaborator Author

I'm going to go with crane optimize for now. Hopefully I'll have something to show by HackyHour 🤞

@jonjohnsonjr
Copy link
Collaborator

So this ties into #732 and some other issues a little bit. I want to take some inspiration from skopeo here, but matt's point about digest preservation is salient: containers/skopeo#1102

Some form of crane copy that supports changing formats would be dope. We can alias crane pull and crane push to just do this. I'm okay with crane convert for a name. I would suggest crane mv, but I really want that to basically crane tag && crane delete, so don't use that...

The way skopeo handles --type flag is to just have schemes for different src/dst types, e.g. containers-storage:docker-reference, docker://, docker-archive:path[:docker-reference], docker-daemon:docker-reference, oci:path:tag.

Any types of transformations like this don't make sense as a scheme, though. I can imagine transformations for stargz, estargz, flatten. Perhaps more.

We could also take some inspiration from jq? pipe might be a dumb name, but:

crane pipe 'pull gcr.io/distroless/static | flatten | estargz | push gcr.io/distroless/static:estargz'
crane pipe 'pull gcr.io/distroless/static | estargz | tarball -' > stargz.tar

We can compose stuff inter-crane invocation by printing intermediate steps to stdout, but it leads to a lot of waste as you end up serializing at each stage, so it would be really nice to have something like this that can take advantage of all the laziness stuff. As an added bonus, @mattmoor gets to write a compiler.

Would love to tackle config file manipulation as part of a larger feature while we're in here, as well.

An optimized crane export w/ file list would be lit

I do crane export $image - | tar -tvf - a lot, but of course it's not optimized, and I get what you mean :P

I'm going to go with crane optimize for now. Hopefully I'll have something to show by HackyHour 🤞

Seems fine to me for demo. I'd be okay with merging it if it's hidden, as well.

@mattmoor
Copy link
Collaborator Author

I could see crane convert encompassing non-optimization cases, but have zero interest in writing a compiler for a hyper-niche language.

This seems like scope creep to the tenth degree 😅

@mattmoor
Copy link
Collaborator Author

This seems to work:

# crane optimize gcr.io/distroless/static:nonroot ghcr.io/mattmoor/distroless/static:nonroot
2020/12/18 16:08:58 Optimizing from gcr.io/distroless/static:nonroot to ghcr.io/mattmoor/distroless/static:nonroot
2020/12/18 16:09:09 pushed blob: sha256:9ee9eb36f870a67cf6965ca0c4c049b361c222bc1d71edb35d5c6cce1219d44b
2020/12/18 16:09:09 pushed blob: sha256:4a1e7de3441951b74dedb8af9e4c6a47473999a1c8a2a4750f11da8f0c9ef6d3
2020/12/18 16:09:10 ghcr.io/mattmoor/distroless/static@sha256:e26f1be45db4f9c94245ce3d60e7e4b5d172a43bf87709d9ff476a0365d0683b: digest: sha256:e26f1be45db4f9c94245ce3d60e7e4b5d172a43bf87709d9ff476a0365d0683b size: 560
2020/12/18 16:09:12 pushed blob: sha256:be81386dc9499d2d0746aed264174d47a6ab73e463a0219a3ea1adad77571885
2020/12/18 16:09:12 pushed blob: sha256:85f1a5df5989e02c50c70ea6f3d42ef6c9fe6e6d84ef6f303c5d8108b68d0f1b
2020/12/18 16:09:13 ghcr.io/mattmoor/distroless/static@sha256:5136863f31d32feb3b5ddfd08488d468d4705136ed11ebee40754ac4d18eeae4: digest: sha256:5136863f31d32feb3b5ddfd08488d468d4705136ed11ebee40754ac4d18eeae4 size: 560
2020/12/18 16:09:14 pushed blob: sha256:10232a66a6f1317a0bc5b6837580a25a6f227b3ef00659b2d2b0591c7ffcb3ba
2020/12/18 16:09:15 pushed blob: sha256:a38f5317835ee5b4ec47cbb390be5ec9a8a36be17e6b57af1e2237efa8788e32
2020/12/18 16:09:16 ghcr.io/mattmoor/distroless/static@sha256:d26624514c5b73b8136ee92483ea6badddd39ff06c656937c8cfde02cad86f0e: digest: sha256:d26624514c5b73b8136ee92483ea6badddd39ff06c656937c8cfde02cad86f0e size: 560
2020/12/18 16:09:18 pushed blob: sha256:f0af31f63efa56cdd750d37f9147e9f767c89f90bf48c7c973aeb6ee8793ce9e
2020/12/18 16:09:18 pushed blob: sha256:3abcb4b29ce8ec459b83d79f690b9ef4e9321d6f9e41fe33b13deedd0f57e667
2020/12/18 16:09:19 ghcr.io/mattmoor/distroless/static@sha256:f09b18121ccdb4b4432e8f03b665dae98b3b2d05d718bda76599cb07388df937: digest: sha256:f09b18121ccdb4b4432e8f03b665dae98b3b2d05d718bda76599cb07388df937 size: 560
2020/12/18 16:09:21 pushed blob: sha256:85c026c4cda8cf7335cbedc17f35acc379cbc0d34932b10afb5ae9f809d099c6
2020/12/18 16:09:21 pushed blob: sha256:d4a999b6a16786faa99ed091614fd0abe053b74fb4b6eca160449371b06953d9
2020/12/18 16:09:22 ghcr.io/mattmoor/distroless/static@sha256:d3b968776b41f374a9cfd35c452ef9f24b8a5596f4d3e8c630f00d42f5eb07c3: digest: sha256:d3b968776b41f374a9cfd35c452ef9f24b8a5596f4d3e8c630f00d42f5eb07c3 size: 560
2020/12/18 16:09:22 ghcr.io/mattmoor/distroless/static:nonroot: digest: sha256:899f4ebaef692be4e0e13a052c6056ea501a7b911509f0cb2e152706ed1b7b2d size: 1165

Right now what I have is super naive, and is basically calling tarball.LayerFromOpener and relying on GGCR_EXPERIMENT_ESTARGZ=1 to trigger the estargz optimization.

mattmoor added a commit to mattmoor/go-containerregistry that referenced this issue Dec 19, 2020
This is a hidden command, which roundtrips a remote image to a target image through `tarball.LayerFromOpener(layer.Uncompressed)`.

Right now this does nothing to force estargz (still need `GGCR_EXPERIMENT_ESTARGZ=1`) or prioritize files (need `estargz.WithPrioritizedFiles(foo)`), but want to start the convo.

Fixes: google#878
mattmoor added a commit to mattmoor/go-containerregistry that referenced this issue Dec 19, 2020
This is a hidden command, which roundtrips a remote image to a target image through `tarball.LayerFromOpener(layer.Uncompressed)`.

Right now this does nothing to force estargz (still need `GGCR_EXPERIMENT_ESTARGZ=1`) or prioritize files (need `estargz.WithPrioritizedFiles(foo)`), but want to start the convo.

Fixes: google#878
@jonjohnsonjr
Copy link
Collaborator

but have zero interest in writing a compiler for a hyper-niche language.

Booo

mattmoor added a commit that referenced this issue Dec 22, 2020
* Start to flesh out crane optimize.

This is a hidden command, which roundtrips a remote image to a target image through `tarball.LayerFromOpener(layer.Uncompressed)`.

Right now this does nothing to force estargz (still need `GGCR_EXPERIMENT_ESTARGZ=1`) or prioritize files (need `estargz.WithPrioritizedFiles(foo)`), but want to start the convo.

Fixes: #878

* Add --prioritize flag to prioritize files

* Fix headers, drop history

* Drop unused variable

* Add explicit option for estargz

* Add a warning comment to crane.Optimize
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants