Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1/layout.WriteIndex ignores platform #817

Closed
deitch opened this issue Nov 6, 2020 · 20 comments
Closed

v1/layout.WriteIndex ignores platform #817

deitch opened this issue Nov 6, 2020 · 20 comments

Comments

@deitch
Copy link
Collaborator

deitch commented Nov 6, 2020

We don't check the platform option when calling WriteIndex() see here:

func (l Path) WriteIndex(ii v1.ImageIndex) error {
	// Always just write oci-layout file, since it's small.
	if err := l.WriteFile("oci-layout", []byte(layoutFile), os.ModePerm); err != nil {
		return err
	}

	h, err := ii.Digest()
	if err != nil {
		return err
	}

	indexFile := filepath.Join("blobs", h.Algorithm, h.Hex)
	return l.writeIndexToFile(indexFile, ii)

}

This predates #815, as it was the same for writeIndex(). We do pass it to AppendIndex:

func (l Path) AppendIndex(ii v1.ImageIndex, options ...Option) error {
	if err := l.WriteIndex(ii); err != nil {
		return err
	}

where one Option is WithPlatform, but we don't actually pass it through to WriteIndex.

Any objection to a PR to fix it?

@jonjohnsonjr
Copy link
Collaborator

I'm misunderstanding something. What should WriteIndex do with the platform?

For AppendIndex, we just use that to populate the descriptor in index.json, but if we're not touching index.json, what behavior do you expect here?

@deitch
Copy link
Collaborator Author

deitch commented Nov 7, 2020

It is easier if I describe it in an example.

I pull down docker.io/library.alpine:3.11. The root manifest for that is sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54, so index.json will contain a manifest entry for that. It will pull down and store the following in blobs/sha256/:

  • sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01 - the manifest for amd64
  • sha256:0ff8a9dffabb5ed8dcba4ee898f62683305b75b4086f433ee722db99138f4f53 - the manifest for armv6
  • sha256:19c4e520fa84832d6deab48cd911067e6d8b0a9fa73fc054c7b9031f1d89e4cf - the manifest for armv7
    etc.

It also will pull down the config for each of those platforms and all of the layers for all of those platforms.

When I pass in WithPlatform(), I didn't read it as "this is the platform you should put in the index.json" (or not only that), but rather, when getting the actual manifests and layers, get only the following platform.

@jonjohnsonjr
Copy link
Collaborator

Interesting. Let's say you pass in the amd64 platform.

For blobs/, do you expect it to contain:

Both sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54 and sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01?

Just sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01?

All manifests, but only blobs for sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01?

I think what you're trying to do, when pulling from a registry, would be, roughly:

ref, _ := name.ParseReference("docker.io/library.alpine:3.11")
img, _ := remote.Image(ref, remote.WithPlatform(platform))
_ = path.WriteImage(img)

When I pass in WithPlatform(), I didn't read it as "this is the platform you should put in the index.json" (or not only that), but rather, when getting the actual manifests and layers, get only the following platform.

This feels a bit odd to me. Given the naming, I'd expect WriteIndex to always write an index. The point where you choose the platform would be when reading from the registry, but maybe dealing with both Image and ImageIndex is what you're trying to avoid?

I would agree that the package global nature of options make them awkward to use, but the godoc for WithPlatform:

WithPlatform sets the platform of the artifact descriptor.

Maybe we should update that to explicitly say "in index.json".

What you're asking for reminds me of the sparse/thin idea for layout, TODO here:

// Option is a functional option for Layout.
//
// TODO: We'll need to change this signature to support Sparse/Thin images.
// Or, alternatively, wrap it in a sparse.Image that returns an empty list for layers?
type Option func(*v1.Descriptor) error

Something like: "I want to you to write this artifact to this image layout, but I only want to write certain parts."

Depending on how many use cases we want a sparse option to support, we could come up with a signature that supports stuff like:

  • Only update index.json
  • Only write manifests to blobs/
  • Only write blobs for things that match this platform, annotation, etc (generically, some descriptor matcher).

I'm tempted to close this as WAI, and if we're interested in pursuing something like this, we could pick this conversation back up: #651 (comment)

@deitch
Copy link
Collaborator Author

deitch commented Nov 8, 2020

I would agree that the package global nature of options make them awkward to use, but the godoc for WithPlatform:
Maybe we should update that to explicitly say "in index.json".

Very good point. I missed that. Once we figure out the right flow, it probably would be a good idea to make this a little clearer, that it only affects what is written in index.json and does not change what blobs are pulled.

do you expect it to contain

I expect it always to contain the root index, and the manifest for whatever platform I am trying to pull. In the above example:

  • sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54 - index as originally returned; this is pointed at by index.json
  • sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01 - manifest for my chosen platform, amd64
  • the config and any layers referenced by the amd64 manifest

I would always want to have the root index, so that the tag in index.json points at the root, and reflects exactly what is in a registry. This is good practice, but it becomes required if I do anything with content trust. Granted notary v1 is a bit of a mess, but notary v2 is on the way.

I also becomes required if I later decide to pull another platform, e.g. arm64. The root index is unchanged, so what does index.json point to? If we only had the manifest, then before it pointed to amd64 manifest, now arm64 and we lose amd64? Or both? The right answer IMO is just what the registry does: tag (in our case index.json) points to index, which is a blob; index points to the rest.

Bringing this back to the starting point (or I am trying to :-) ), the behaviour I want is that I can get the root index for a particular tag, and then pull down just those blobs that start in the matching platform in that index.

FYI, if there already is a way to do this, then this just becomes a documentation question. I don't think remote.Index or remote.Get would work, since that does the filtering in advance, and will not get me the actual root. In the above example, it would pull sha256:39eda93d15866957feaee28f8fc5adb545276a64147445c64992ef69804dbf01 as the root, pointing to it in index.json, and missing the real root index, sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54, entirely.

The correct flow does appear to be not filtering during the remote.Image(), which goes from root->Index->Manifest and return the Manifest, but rather to use remote.Get, and then filter during the actual Write.

As I think about it, I might be able to do it as follows:

  1. remote.Get
  2. write just the blob for the index retrieved above
  3. append to index.json just the blob for the index retrieved above
  4. resolve the correct manifest by platform by filtering on the index
  5. write (but not append) the image given by the above manifest

So the tools all are there, but it is messy.

@deitch
Copy link
Collaborator Author

deitch commented Nov 22, 2020

I am coming back at this, now that we have a fairly nice and clean match package in place.

My usual way of getting an index and its underlying images is something like this (largely informed by handholding from @jonjohnsonjr 😁):

                ref, err = name.ParseReference(image)
                desc, err = remote.Get(ref, options...)
                ii, err := desc.ImageIndex()
                err = p.AppendIndex(ii, opts...) 
                // OR
                err = WriteIndex(ii) 

legacy and v1 tarballs aren't an issue, since they only support single images.

I think the simplest way forward here is to create an additional layout.Option for layout.AppendIndex(), where we can use WithMatcher(match.Matcher). We would need to add that to WriteIndex() as well, but do it with variadic options (like AppendIndex() does, so that the signature is backwards-compatible and forward-growable.

I would then create some a utility matcher for the single most common use case, i.e. that I want the index and any index children, but when I come to an actual image, match it to a particular platform.

Like to get your feedback @jonjohnsonjr , and then we can get a PR in.

@jonjohnsonjr
Copy link
Collaborator

jonjohnsonjr commented Nov 24, 2020

legacy and v1 tarballs aren't an issue, since they only support single images.

This isn't actually true!

We would need to add that to WriteIndex() as well, but do it with variadic options (like AppendIndex() does, so that the signature is backwards-compatible and forward-growable.

I'm terrified of what this package will look like with multiple sets of variadic options that apply to different methods 😄

I would then create some a utility matcher for the single most common use case, i.e. that I want the index and any index children, but when I come to an actual image, match it to a particular platform.

I'd like to nail down the semantics of this option and make sure it satisfies a lot of use cases, is easy to understand, and easy to use. My intuition after half a bottle of wine is something like this:

p.AppendIndex(ii, layout.WithMatcher(func(desc v1.Descriptor) bool {
  if desc.MediaType.IsImage() {
    return desc.Platform.Equals(platformYouCareAbout)
  }

  return true
}))

But I don't think this is right. One implementation of AppendIndex might skip writing the manifest and blobs for any child images that don't match platformYouCareAbout. Another implementation might skip writing just the manifest but still include all the blobs.

I don't see a way to easily express the latter with this interface (beyond having a separate option to distinguish them... WithRecursiveMatcher for your use-case?).

Of course we can always add an option to do the very specific thing that you need, but I'd like to be as generic as we can get away with... I'd be interested in any ideas if you have them.

@jonjohnsonjr
Copy link
Collaborator

(FWIW, for your use case specifically, we could document that Write methods do <thing you expected> and just implement it that way, but then I would feel really bad for making you go on this wild goose chase around this project 😄)

@deitch
Copy link
Collaborator Author

deitch commented Nov 24, 2020

satisfies a lot of use cases, is easy to understand, and easy to use

Doesn't that violate CAP theorem? You can pick only 2 out of 3. 😂

@deitch
Copy link
Collaborator Author

deitch commented Nov 24, 2020

I'm terrified of what this package will look like with multiple sets of variadic options that apply to different methods

That is a very good point. it is a reason why I haven't tried to tackle it yet.

Actually, different options applied to different methods don't bother me too much, as each method can just use the ones it wants. The problem becomes, what do you do when they conflict. For example, is WithPlatform the option as to what platform I should write to index.json when appending an Image or Index? Or is it the specific platform I should download? And what if I want multiple?

But I don't think this is right. One implementation of AppendIndex might skip writing the manifest and blobs for any child images that don't match platformYouCareAbout. Another implementation might skip writing just the manifest but still include all the blobs.

I disagree with that statement. If I skip a manifest, I skip all children blobs. No matter how you do this, it should be a tree-walker or it is impossible to reason about. E.g. containerd will garbage-collect any blobs that do not have a parent, i.e. manifest/index. These all are trees, and a leave cannot exist without the branch it is on.

Granted, you could explicitly download a specific blob, but that is beyond the scope of this discussion.

Of course we can always add an option to do the very specific thing that you need, but I'd like to be as generic as we can get away with.

I agree. Although I think "filter by platform" already is generic enough to not be "very specific thing", but I would like a generic solution.

I am going to put some ideas in the next comment.

@deitch
Copy link
Collaborator Author

deitch commented Nov 24, 2020

I keep coming back to your example above, and I actually think it works.

p.AppendIndex(ii, layout.WithMatcher(func(desc v1.Descriptor) bool {
  if desc.MediaType.IsImage() {
    return desc.Platform.Equals(platformYouCareAbout)
  }

  return true
}))

If we assume that as soon as I skip a node, I skip all of its children (which is eminently reasonable), then this works perfectly.

If we really want to tackle the "skip a generation" case, I would suggest the following:

  • we don't have a requirement now, so avoid the complexity for now
  • recursive can be implemented using a match.Matcher as well. Just use closures with a tracker as to the depth, etc. I won't say it is trivial, but it is doable (we may want an example at some point, if that use case comes up), and so we don't have to change the API in the future

@jonjohnsonjr
Copy link
Collaborator

These all are trees, and a leave cannot exist without the branch it is on.

I run into something like this a lot where:

  1. I only want valid states to be representable in my API.
  2. I would like to test what happens in invalid states.

To achieve 2, now I have to reach around the API and write some really janky code. It would be great if I could reuse the nice tools and API to produce testdata for these "invalid" states. In this particular case it's frustrating because I agree with what you're saying 99% of the time, but the data structure doesn't explicitly forbid this kind of thing, so it's not really an "invalid" state.

If we assume that as soon as I skip a node, I skip all of its children (which is eminently reasonable), then this works perfectly.

I think I'm okay with this being a default assumption as long as it's clearly documented. If we come up with a use case where we only want to write blobs (or something like that), we can create a new, more oddly named option. I'm also okay with sacrificing the ergonomics of the 1% case in favor of the ergonomics of the 99% case.

@deitch
Copy link
Collaborator Author

deitch commented Nov 25, 2020

but the data structure doesn't explicitly forbid this kind of thing, so it's not really an "invalid" state.

Good point. Well, we do have WriteBlob(), and nothing preventing us from looking at a manifest and reading it and writing a blob, i.e. skipping the branch, just using it to find the leaf and writing the leaf. If we hit that more than a few times, we can write the matcher for it.

I think I'm okay with this being a default assumption as long as it's clearly documented. If we come up with a use case where we only want to write blobs (or something like that), we can create a new, more oddly named option. I'm also okay with sacrificing the ergonomics of the 1% case in favor of the ergonomics of the 99% case.

I can open a PR for this one, let us see where it goes.

@deitch
Copy link
Collaborator Author

deitch commented Nov 25, 2020

I am a little stuck. I don't know how to make this part work:

p.AppendIndex(ii, layout.WithMatcher(func(desc v1.Descriptor) bool {
  if desc.MediaType.IsImage() {
    return desc.Platform.Equals(platformYouCareAbout)
  }

  return true
}))

The signature for AppendIndex is:

func (l Path) AppendIndex(ii v1.ImageIndex, options ...Option) error

where each option is:

type Option func(*v1.Descriptor) error

In AppendIndex() we do:

	if err := l.WriteIndex(ii); err != nil {
		return err
	}
        // ...
	for _, opt := range options {
		if err := opt(&desc); err != nil {
			return err
		}
	}

Each option processes/modifies the final v1.Descriptor that is written to index.json.

How would I pass an option that does not do that, but instead needs to send a match.Matcher that can be extracted early on, so it can be passed to WriteIndex()?

@jonjohnsonjr
Copy link
Collaborator

where each option is:

Yeah we'll need to change the Option signature. It's possible that someone out there is constructing their own option here, but I think that's unlikely... so while this would be a breaking change in theory, I don't think it matters in reality.

@deitch
Copy link
Collaborator Author

deitch commented Nov 29, 2020

What would you change it into? I couldn't wrap my head around what the "new" layout.Option might look like.

@jonjohnsonjr
Copy link
Collaborator

Basically do what remote does: https://github.com/google/go-containerregistry/blob/master/pkg/v1/remote/options.go#L28-L38

For the existing options, we'd need to thunk-ify them: #847

You can then add whatever you want to that options struct and use it wherever appropriate.

@deitch
Copy link
Collaborator Author

deitch commented Nov 30, 2020

For the existing options, we'd need to thunk-ify them: #847

No, I wouldn't... because you already did it! 😆

I will wait for #847 to go in, then I will create on top of that.

@github-actions
Copy link

github-actions bot commented Mar 1, 2021

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@jonjohnsonjr
Copy link
Collaborator

Merged that, BTW.

@github-actions
Copy link

github-actions bot commented Jun 1, 2021

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Keep fresh with the 'lifecycle/frozen' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants