-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Attach Git information via image labels #1290
Comments
A couple of queries/points/nitpicks on implementation details - I think the high-level sounds reasonable 😄
|
Thanks @jedevc for your feedback. Let me respond inline.
I totally agree that annotations are preferable over labels for this kind of information (mostly because they aren't inherited). To me, moving to annotations opens up the following two questions really:
I've updated the POC code to use annotations at: cdupuis@109c305
I put this into buildx because it seemed like the only place where there is access to the
Actually there are quite a lot of use cases that depend on knowing where the Dockerfile is within the Git repository. We use this to send update and fix PRs, re-pinning PRs or just to enrich layer instructions with start and end line numbers. Of course we could separate that out if this seems misplaced here. Good point about leaking file path information. I like the idea of making sure that we only store relative (to the root of the context) paths. |
While implementation in buildkit frontend is somewhat more consistent it is much more tricky. We would need to transfer the Alternatively, we could add a completely new session API to buildkit for this query, but that would mean we don't have a backward compatible solution. Also, because the directory is not frozen, there isn't really any guarantee that this is the same directory that is being sent in parallel with the build context.
Currently in buildinfo, sources are captured automatically(with opt-out) but attributes require Iiuc, compared to your POC cdupuis@cb8253f that uses env for opt-in you would like the real solution to be enabled by default? Some considerations for that:
@AkihiroSuda @cpuguy83 wdyt? Are you ok with client setting these labels(or annotations) automatically based on We should still make it clear that users should prefer to build directly from Git URL. It has many benefits like BuildKit providing the actual validation of the Git data or tracking build cache by commit sha. There shouldn't be any case that works better with building from local dir than building from Git URL. But I agree that we have a lot of users, and it is hard to expect all of them to modify their builds. |
s/OK/unaware/ 😅 Also, some users who do not use Go might not be OK with storing the value to the image by default. |
I'm thinking that ultimately we need to make supply chain attestations (like VCS provenance) and validation of them the default behaviour, much like what browser vendors and letsencrypt have done with TLS, if we want to make significant progress here. So for me, even if we start with an opt-in, there's still the question of when and how we make it an opt-out. |
Makes sense. I wonder if we might want to put the logic into the buildkit |
@jedevc, I'll take a look at that. Thanks.
@tonistiigi, That's a good point. Any pointers how to solve this?
👍🏽 |
This seems ok as an opt-in. |
Validation can still be done, at least in theory, but it will be subject to repo accessibility. Also, once provenence data is turned into a signed attestation and not just labels, there is no reason to just accept it to be valid and leave validation as an optional (and possibly complex) step that one can undertake separately.
Does it use tree/object hashes or actual commits? Is it documented? If it is based on commit hashes only, I am not sure it is always desirable.
I think there are too many aspects and I cannot see a way of picking what is "better". We can look at repo access credentials to start with, but there is a dozen (or so) of other nuances of how people use git that will be hard to take care of in BuildKit. |
Yeah, I think opt-in makes sense for the early label-based expreiment. Once it's implemented as a proper attestation, it should be an opt-out. |
I think it was promoted heavily and feedback I saw was positive. Clearly Go maintainers could have left it opt-in as well but that would have meant far fewer users could have benefited from it. Note that builds from git repository already store this info by default without any opt-in.
Not sure what you have exactly in mind here, but buildkit client is very unopinionated. There is no such thing as "main source directory" or smth. so don't know how that would look like.
Nothing very good. The label sent from the client takes precedence over the one defined by Dockerfile, what is not the behavior we want. With opt-in that wouldn't be a problem of course. I guess if we go with annotations instead of labels then at least there is no issue with overwriting values but it might still be weird in some extreme cases where a label and annotation could point to a different value.
We have merged the initial attestation PRs on BuildKit side that can be used for provenance, but I don't necessarily think they should be a blocker. These labels have been defined for a very long time, and even the annotation variants have been defined for years. All current tooling only understands labels/annotations.
It can't be done with regular git tools that don't check file contents. Then
You can not validate the source later, it will not be accessible from the result after the build has completed. Signing does not address the same threat vector as the builder assuring that the content of the sources matches their description.
It is desirable that you get build cache when reusing the same source. |
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
as per docker#1290 Signed-off-by: Christian Dupuis <cd@atomist.com>
I am saying that, in principle, once there is a attestation, it can be validated separately. If that had to be done inside buildkitd, it should be somehow possible, albeit not without some additional setup steps, especially around repo access creds. And the validation tools may need to be designed from scratch for this purpose.
I agree on that part, but I my concern was regarding what happend the other way around. I think that a commit that doesn't actually change anything relevant to a particular build shouldn't invalidate cache. |
I don't really understand this. Any attestation definitely can not be validated at all and just needs to be trusted. Eg. if you have a timestamp inside provenance attestation there is no way to not just trust that value. For the source, if there is a way to validate it at all then it needs to be included as full content or can be pulled by checksum.
Yes, this is how buildkit cache works. If the commit is different, then some steps need to run again, eg. we at least need to clone a repo again and calculate checksums over files that the build actually uses. If these steps determine that changes were not relevant then the cache will pick up even if initial commits were different. If commit can be guaranteed to be the same then no clone or file checksum steps (or in some cases you would need to run compiler again to prove that its still generates the same binary) are needed. |
I guess it was a bit inaccurate of me to talk validation, we are really talking about verification. What I am implying is that one should be able to make interpretation of an attestation and use the information to re-construct at least the key elements of the build process. I am not suggesting that deep verification tools exist today, I would be surprised if theyh did, I'm only refering to the idea of verification in principle. From https://slsa.dev/provenance/v0.2: "To trace software back to the source and define the moving parts in a complex supply chain, provenance needs to be there from the very beginning. It’s the verifiable information about software artifacts describing where, when and how something was produced."
There is some use to timestamps actually, if you have access to system logs, you can check if the build has occurred at the said time. The source/materials are specified by URI and digests, which definely makes it verifyible, albeit subject to durability and access. |
As the industry attempts to move security futher to the left, it becomes increasingly important to link built and pushed container images to their origin; ideally via Git metadata. This enables numerous use cases for tools along the software supply chain. Here are just a few examples:
apt-get install
etc linesWe are aware that there is currently a lot of great work underway to securly and verifiably attach such provenance information to container images via signed attestations. Additionally similar provenance data is recorded when using remote Git contexts with buildx/buildkit. This work is still very early.
Ideally we would have a very pragmatic solution that could work today without requiring users of
docker build
to every build.Therefore, we'd like to propose that buildx starts to record the following pieces of provenance when being run with a context that has a
.git
directory:From a privacy perspective we believe that storing Git commit SHAs shouldn't represent a concern given what we see other tools doing (eg.
go build
storing very similar information without requiring opt-in) and images from private repositories can be pushed to private registries.We propose to store this information in image labels following the naming conventions set out by https://specs.opencontainers.org/image-spec/annotations/ accepting the way label inheritance can complicate things. This convention is already widely adopted by vendors and projects:
Initially, we'd want to make storing the Git information by buildx opt-in via an environment variable switch.
There's a POC with the suggested changes at: cdupuis@cb8253f
Before raising a pull request with the proposed additions, we'd wanted to raise this issue for community feedback.
The text was updated successfully, but these errors were encountered: