Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

name-assertion: Add a name-assertion type #445

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Contributor

@wking wking commented Nov 4, 2016

This doesn't get into signature formats, assertion discovery, or signature-discovery. But we need to agree on the assertion format itself before those become interesting problems. We can continue to fill in the remaining pieces in follow-up work.

The assertion blob must be self-describing because the signature covers the blob itself, and not any descriptors referencing that blob. For example, a linking structure like:

{
  "blob": {descriptor referencing the assertion},
  "signature": {descriptor referencing the signature}
}

might be altered by an attacker to adjust the descriptor media types. Having self-describing assertions and signature blobs protects from such attacks.

The media-type header makes assertions self-describing in a way that can survive potential future transitions to non-JSON formats. If we are comfortable requiring JSON assertions forever, we can move the self-describing media type into the assertion object as mediaType.

The media-type header is awkward enough that it's good to limit it to the small assertion blob. An alternative approach to applying assertions would be to add the asserted attribute (e.g. the name) to the signed object itself (e.g. as a manifest property), but having purely-JSON manifests (without the media-type header) makes them easier to manipulate with generic tools like jq.

Having the assertion in an independent object also clarifies the assertion being signed. With a name property on the manifest, a signature on the manifest is signing the entire DAG rooted at that manifest and asserting something vaguely good about that DAG. With a separate name assertion you can say “this is Nginx 1.10.1” without implying “I've audited all the source code and the build system for security flaws and found none” or other similarly high bars. You can of course define a separate security-audit assertion and apply that to the manifest as well.

A final benefit of stand-alone assertions is that you can apply them to any blob. The name assertion added hear can be applied to manifest-lists, manifests, configurations, layers, etc., etc. Getting the same coverage via embedded properties would require per-type adjustments.

Related discussion in #22, #173, #176, #400, and containers/image#59.

@mtrmac
Copy link
Contributor

mtrmac commented Nov 4, 2016

A final benefit of stand-alone assertions is that you can apply them to any blob. The name assertion added hear can be applied to manifest-lists, manifests, configurations, layers, etc., etc.

I very strongly feel that, for image signing, this is undesirable and there shouldn’t be any user-visible “name assertion” concept or file format.

Yes, a “name assertion” may be an useful concept to use when thinking about some particular signature implementations, but ultimately we are not in the algebra-like business of boiling down concepts to their purest and most abstract forms; we are in the engineering business of designing a robust system of file formats.

One of the concerns about signature security is type/content/semantics confusion: What if an attacker is able to take a signature of a layer and convince a consumer that it is a signature of an image, or the other way around? What if an attacker is able to take a signature naming an URL and convince a consumer that is a signature of a Docker reference, or vice versa?

Creating a generic “name assertion” format which is intentionally substitutable in both of these ways would create completely unnecessary opportunities for such confusion.

AFAICT we should be doing the exact opposite, and defining signature formats such that it is, as far as possible, entirely unconcievable to apply a signature to the wrong kind of object or the wrong kind of semantics. Preferably there should be only one kind of signature; and if there are several, they should be stored by the storage mechanism / distributed by the distribution mechanism in completely separate locations which are never mixed together, and they should use formats which are clearly not interoperable (if there were hypothethical image signatures and manifest signatures, we should ensure that a layer signature which contains a manifest digest, or a manifest signature which contains a layer ID, would always be reliably rejected).

From my POV, it is no more practical to talk about abstract “name assertions” instead of “image signatures” at the file format level than it is appropriate to talk about abstract “finite fields” instead of the need for particular side-channel- free "Z_n" big-endian bignum implementations at the RSA implementation level.

Sure, these concepts can be very useful when thinking about the design and implementation, and perhaps as an internal helper inside the implementation (though relying on a strong type system separation between different kinds of signatures which never meet in the code seems more robust). But these algebra-like concepts should really not leak into the user-visible file formats, especially not as a substitutable layer of abstraction.

@wking
Copy link
Contributor Author

wking commented Nov 4, 2016

On Fri, Nov 04, 2016 at 02:06:14PM -0700, Miloslav Trmač wrote:

One of the concerns about signature security is
type/content/semantics confusion: What if an attacker is able to
take a signature of a layer and convince a consumer that it is a
signature of an image, or the other way around? What if an attacker
is able to take a signature naming an URL and convince a consumer
that is a signature of a Docker reference, or vice versa?

Are you having display concerns here? Clearly, if you've signed a
name assertion you're signature will only apply to blobs matching the
blob descriptor. Assuming you've picked a sane hashing algorithm for
that descriptor's digest, I don't see how you could accidentally
verify a different blob as the target of the assertion. Of course,
this doesn't cover how these assertions are displayed to end-users (if
they ever are). Is that end-user experience what you're worried
about?

Preferably there should be only one kind of signature…

I've tried to motivate naming each existing OCI type in my topic post.
Does only one of those motivations make sense to you? If so, which
one? More discussion on this point in #400.

… and if there are several, they should be stored by the storage
mechanism / distributed by the distribution mechanism in completely
separate locations which are never mixed together, and they should
use formats which are clearly not interoperable

Inventing separate formats and distribution mechanisms for multiple
instances of the same idea (where the only difference is the target of
the assertion) does not sound useful to me. Maybe I'm
misunderstanding your proposal? Can you sketch this out in more
detail or point me at existing documentation for your preferred
approach?

(if there were hypothethical image signatures and manifest
signatures, we should ensure that a layer signature which contains a
manifest digest, or a manifest signature which contains a layer ID,
would always be reliably rejected).

I'm having difficulty parsing this. Are you saying someone signs:

application/vnd.oci.name.assertion.v1
{
"name": "foo-bar v1.0",
"blob": {
"size": 4096,
"digest": "sha256:abc…",
"mediatype": "application/vnd.oci.image.layer.v1.tar+gzip"
}
}

to name a layer, and then an attacker provides the original signature,
the original name assertion, and a manifest whose digest happens to be
"sha256:abc…" and size happens to be 4096 to execute a collision
attack on the referenced blob?

Or are you saying they forge their own name assertion like:

application/vnd.oci.name.assertion.v1
{
"name": "foo-bar v1.0",
"blob": {
"size": 1234,
"digest": "sha256:def…",
"mediatype": "application/vnd.oci.image.manifest.v1+json"
}
}

and provide the original signature and the forged name assertion
(which the signature happens to validate)?

Both of those seem like far-fetched hash-collision attacks to me. If
you're concerned about them, you should pick stronger hashes to make
yourself comfortable again. If there are no hashes strong enough to
make you comfortable, you should be absolutely terrified about signing
and Merkle DAGs in general and we should throw this whole thing out
and go and live in a bunker under a big mountain ;).

@mtrmac
Copy link
Contributor

mtrmac commented Nov 4, 2016

One of the concerns about signature security is
type/content/semantics confusion: What if an attacker is able to
take a signature of a layer and convince a consumer that it is a
signature of an image, or the other way around? What if an attacker
is able to take a signature naming an URL and convince a consumer
that is a signature of a Docker reference, or vice versa?

Preferably there should be only one kind of signature…

I've tried to motivate naming each existing OCI type in my topic post.
Does only one of those motivations make sense to you? If so, which
one? More discussion on this point in #400.

In short, signing individual layers has dubious value to me (especially try forming a 5-minute elevator pitch which explains the user what a layer signature exactly does or does not mean), and we shouldn’t be asking the user to make any decision between signing configs/arch-specific manifest/manifest digests; it is for us to design and choose the best one, or the best combination.

(We are not creating Legos from which anyone can individually build their own special cryptosystem snowflake; we are creating an interoperability specification, we need every consumer of the spec to interpret the signatures consistently. Options/variants make systems less interoperable, and more confusing to the user. (Also, for creating an interoperability spec, a fair degree of the semantics of the signatures is essential component of the interoperability spec, for the same reason.))

… and if there are several, they should be stored by the storage
mechanism / distributed by the distribution mechanism in completely
separate locations which are never mixed together, and they should
use formats which are clearly not interoperable

Inventing separate formats and distribution mechanisms for multiple
instances of the same idea (where the only difference is the target of
the assertion) does not sound useful to me.

Really I’m betting on “there should be only one”; but, if nothing else, using entirely specific field names, like critical.image.docker-manifest-digest in containers/image#59 requires a pretty high amount of effort for the consumer to be confused about what the field means.

(if there were hypothethical image signatures and manifest
signatures, we should ensure that a layer signature which contains a
manifest digest, or a manifest signature which contains a layer ID,
would always be reliably rejected).

I'm having difficulty parsing this.

The base problematic scenario is:

  • An attacker uploads a “blob” which happens to be a valid image manifest (because, why not, a blob is a blob, and it is not the storage system’s role to look inside or question it, especially in a pure CAS / Merkle world).
  • The attacker $somehow convinces a layer signing system that this blob is a layer (perhaps by building a manifest which refers to this blob’s digest) and by submitting this layer to a signing system, obtaining a layer signature.
  • The attacker then uploads this blob somewhere else, claiming that it is a manifest, and having the signature to prove it. Consumers might notice that the MIME type does not match… or not; it is easy to imagine an implementation which forgets.

Of course the $somehow is the first failure point, but there really are distinct layer and manifest signatures, they pretty much must have distinct semantics, and this attack allows escalating an unauthorized layer signature into an unauthorized manifest signature.

@mtrmac
Copy link
Contributor

mtrmac commented Nov 4, 2016

Consumers might notice that the MIME type does not match… or not; it is easy to imagine an implementation which forgets.

(And this is much worse for the name, which has no semantics defined inside the JSON at all.)

@wking
Copy link
Contributor Author

wking commented Nov 4, 2016

On Fri, Nov 04, 2016 at 02:53:00PM -0700, Miloslav Trmač wrote:

In short, signing individual layers has dubious value to me…

I think the “I'm compiling an image from single-package layers that
have been signed by their authors” is actually pretty cool ;). I
agree that it's not going to be something image consumers will care
about, but it might be something image authors will care about.

… we shouldn’t be asking the user to make any decision between
signing configs/arch-specific manifest/manifest digests; it is for
us to design and choose the best one, or the best combination.

I disagree. I think it's good to provide the tools in one layer, and
make them as generic as we can without jumping through hoops (and the
name assertions I'm floating here seem generic without hoop jumping).
Then you can apply the policy (e.g. “we only respect manifest-list
signatures”) at a higher level. Because there are trade-offs to each
choice (manifest-list vs. manifest vs. config), and I don't think a
single choice is going to fit all consumers.

(We are not creating Legos from which anyone can individually build
their own special cryptosystem snowflake; we are creating an
interoperability specification,…

I agree that diverging crypto policies will make the whole ecosystem
less interoperable. If Alice only names configurations and Bob only
accept images with named manifest-lists, Bob is not going to be using
any of Alice's images. But I think that's fine. If it ends up being
a friction point, Alice and Bob can get together and… hash it out :p.

… we need every consumer of the spec to interpret the signatures
consistently.

Absolutely, and I think that the current semantics are pretty clear.
A name assertion applies a name to a blob. Image publishers name
whatever they want. Image consumers start off with a
location-addressable name and get a descriptor pointing into CAS.
They walk the DAG as far as they feel comfortable:

“Down to the manifest and still no signed name assertion? I'm
getting out of here!”

Once they find a signed name assertion they verify it:

“Ah, Alice signed this. She knows what she's doing. Lets open the
blob she signed. A name assertion :). Looks like she thought the
next step in the DAG was a manifest named ‘nginx-1.10.1’. Good,
that matches the ref I asked the ref engine to resolve for me.
Should be smooth sailing from here.”

or maybe:

… Looks like she thought the next step in the DAG was a manifest
named ‘rootkit-33.3’. I asked the ref engine to resolve
‘nginx-1.10.1’! I'm getting out of here!”

or maybe:

“Charlie? Who is Charlie? I'm getting out of here!”

etc. And namers get to decide how much resigning their is willing to
take on:

“Ah, you've replaced my sha256:abc… layer with a sha256:123… layer
because it has the same diffID and is more popular in your CAS? No
problem! I'm pushing a name assertion for the new manifest now. My
consumers are lucky they have me.”

or:

“I'm never going to touch this project again. Better sign the
config so folks have the best chance of using it going forward.
Hopefully my consumers will be brave enough to drill down that far…”

Really I’m betting on “there should be only one”; but, if nothing
else, using entirely specific field names, like
critical.image.docker-manifest-digest in
containers/image#59 requires a pretty high
amount of effort for the consumer to be confused about what the
field means.

The media type header means you can't even unmarshal the assertion
with an off-the-shelf JSON parser. That seems like it's rubbing the
unpacker's nose in the type sufficiently ;).

  • The attacker then uploads this blob somewhere else, claiming that
    it is a manifest, and having the signature to prove it. Consumers
    might notice that the MIME type does not match… or not; it is
    easy to imagine an implementation which forgets.

Of course the $somehow is the first failure point, but there really
are distinct layer and manifest signatures, they pretty much must
have distinct semantics, and this attack allows escalating an
unauthorized layer signature into an unauthorized manifest
signature.

Your argument here seems to be:

  1. We need distinct signatures for manifests and layers.
  2. The blob.mediaType field in the name assertion is not sufficient to
    distinguish between manifest and layer signatures, because
    implementers can be really sloppy.
  3. critical.image.docker-manifest-digest, on the other hand, is a
    completely acceptable way to distinguish between manifest and layer
    signatures.

I'm fine with (1), but don't buy (2). Respecting a descriptor's
mediaType is fundamental to walking the OCI DAG. Any sane DAG-walking
implementation is going to take it very seriously. And with a handful
of sane DAG-walking implementations in the ecosystem (hopefully ;),
why would an implementer choose a half-baked,
descriptor-mediaType-ignoring implementation for security-critical
assertion handling? And without (2), I see no reason to go to (3).

If we did go to (3), we'd be pulling in new, non-core descriptor-ish
handling that only applied to assertion handling. And that doesn't
seem more robust or understandable to me.

On Fri, Nov 04, 2016 at 02:54:52PM -0700, Miloslav Trmač wrote:

(And this is much worse for the name, which has no semantics
defined inside the JSON at all.)

The semantics are available via assertion's media type → spec defining
that media type. So there's a potential hole where an attacker
convinces a dev that application/vnd.oci.name.assertion.v1 is really
defined by their malicious spec:

“It is a little odd that the ‘name’ value must always be
‘rootkit-33.3’. I wonder how this thing asserts a name anyway. Oh
well, at least we'll have a safe, compliant unpacking system!”

That doesn't seem very likely to me.

This doesn't get into signature formats, assertion discovery, or
signature-discovery.  But we need to agree on the assertion format
itself before those become interesting problems.  We can continue to
fill in the remaining pieces in follow-up work.

The assertion blob must be self-describing because the signature
covers the blob itself, and not any descriptors referencing that blob.
For example, a linking structure like:

  {
    "blob": {descriptor referencing the assertion},
    "signature": {descriptor referencing the signature}
  }

might be altered by an attacker to adjust the descriptor media types.
Having self-describing assertions and signature blobs protects from
such attacks.

The media-type header makes assertions self-describing in a way that
can survive potential future transitions to non-JSON formats.  If we
are comfortable requiring JSON assertions forever, we can move the
self-describing media type into the assertion object as mediaType.

The media-type header is awkward enough that it's good to limit it to
the small assertion blob.  An alternative approach to applying
assertions would be to add the asserted attribute (e.g. the name) to
the signed object itself (e.g. as a manifest property), but having
purely-JSON manifests (without the media-type header) makes them
easier to manipulate with generic tools like jq.

Having the assertion in an independent object also clarifies the
assertion being signed.  With a 'name' property on the manifest, a
signature on the manifest is signing the entire DAG rooted at that
manifest and asserting something vaguely good about that DAG.  With a
separate name assertion you can say "this is Nginx 1.10.1" without
implying "I've audited all the source code and the build system for
security flaws and found none" or other similarly high bars.  You can
of course define a separate security-audit assertion and apply that to
the manifest as well.

A final benefit of stand-alone assertions is that you can apply them
to any blob.  The name assertion added hear can be applied to
manifest-lists, manifests, configurations, layers, etc., etc.  Getting
the same coverage via embedded properties would require per-type
adjustments.

Signed-off-by: W. Trevor King <wking@tremily.us>
@mtrmac
Copy link
Contributor

mtrmac commented Nov 7, 2016

In short, signing individual layers has dubious value to me…

I think the “I'm compiling an image from single-package layers that
have been signed by their authors” is actually pretty cool ;). I
agree that it's not going to be something image consumers will care
about, but it might be something image authors will care about.

Considering that the only way to refer to layers is by digests, which are already authenticating them, keeping the layer digests in a build system database, and keeping the layer digests + a set of public key in the same build system database is not noticeably more secure. Compromising that database allows an attacker to substitute a layer either way. Unless you are proposing some kind of repo/name/tagging system for individual layers which would allow for reasonable end-to-end verification.

(We are not creating Legos from which anyone can individually build
their own special cryptosystem snowflake; we are creating an
interoperability specification,…

I agree that diverging crypto policies will make the whole ecosystem
less interoperable. If Alice only names configurations and Bob only
accept images with named manifest-lists, Bob is not going to be using
any of Alice's images. But I think that's fine. If it ends up being
a friction point, Alice and Bob can get together and… hash it out :p.

There are already enough GPG and CMS and whatever signing implementations and enough data formats around that everyone has more than enough Lego bricks to built whatever they need.

OCI should be the place where the “hashing it out” happens so that the whole ecosystem is interoperable (who else would do it? why not OCI?). If the result of the signing effort is not an interoperable specification and we end up requiring every pair of users to "hash it out” then we have really achieved nothing of value to me.

… we need every consumer of the spec to interpret the signatures
consistently.

Absolutely, and I think that the current semantics are pretty clear.
A name assertion applies a name to a blob. Image publishers name
whatever they want.

What is a name? It’s just a string. “Applying a string to a blob” has close to zero semantics.

“Ah, Alice signed this. She knows what she's doing. Lets open the
blob she signed. A name assertion :). Looks like she thought the
next step in the DAG was a manifest named ‘nginx-1.10.1’. Good,
that matches the ref I asked the ref engine to resolve for me.
Should be smooth sailing from here.”

What about content of any other files which have already been processed? What if the ref was using a :latest tag and the signed name does not have a tag? What if one of them uses a :500 port number and the other :0500 port number? What if the host name in the original ref was implied, and the default hostname is different between the consumer’s and Alice’s implementations? “It’s just a string” is completely insufficient.

Really I’m betting on “there should be only one”; but, if nothing
else, using entirely specific field names, like
critical.image.docker-manifest-digest in
containers/image#59 requires a pretty high
amount of effort for the consumer to be confused about what the
field means.

The media type header means you can't even unmarshal the assertion
with an off-the-shelf JSON parser. That seems like it's rubbing the
unpacker's nose in the type sufficiently ;).

No, that only assures the client that it is processing an assertion; not an assertion for the expected kind of object and name.

  • The attacker then uploads this blob somewhere else, claiming that
    it is a manifest, and having the signature to prove it. Consumers
    might notice that the MIME type does not match… or not; it is
    easy to imagine an implementation which forgets.
    Of course the $somehow is the first failure point, but there really
    are distinct layer and manifest signatures, they pretty much must
    have distinct semantics, and this attack allows escalating an
    unauthorized layer signature into an unauthorized manifest
    signature.

Your argument here seems to be:

  1. We need distinct signatures for manifests and layers.

No, I don’t think we do need them; but name assertions as an abstraction only make sense if there are distinct kinds of signatures, so the attack scenario assumes that.

  1. The blob.mediaType field in the name assertion is not sufficient to
    distinguish between manifest and layer signatures, because
    implementers can be really sloppy.
  2. critical.image.docker-manifest-digest, on the other hand, is a
    completely acceptable way to distinguish between manifest and layer
    signatures.

I'm fine with (1), but don't buy (2). Respecting a descriptor's
mediaType is fundamental to walking the OCI DAG. Any sane DAG-walking
implementation is going to take it very seriously.

No, so far it has been very easy to ignore it; I know that manifest lists should contain manifests, and I know that manifests refer to blobs. Why would I even look at the manifest MIME type?

And with a handful
of sane DAG-walking implementations in the ecosystem (hopefully ;),
why would an implementer choose a half-baked,
descriptor-mediaType-ignoring implementation for security-critical
assertion handling?

That’s really backwards; we want to minimize the amount of code which needs to be paranoid. The signature verification should happen as early in the process as possible, with as minimal handling of the contents of the signed data as possible. A fully generic “sane” DAG-walking implementation should not be even allowed to touch the data until it has been cryptographically verified.

And without (2), I see no reason to go to (3).

If we did go to (3), we'd be pulling in new, non-core descriptor-ish
handling that only applied to assertion handling.

(You’ve suggested elsewhere that signatures should reuse the descriptor type; I disagree. The code verifying signatures needs to be minimal and paranoid, and the design needs to be tightly controlled, which is a completely different set of design constraints to the rest of the distribution problem space. The signature verification code really does not want to implicitly add an urls field and the associated semantics just because the rest of the distribution system has found it useful and added it into the descriptor type. And once we start talking about “descriptor-without-urls” we are better off not reusing the type and implementation at all.)

@wking
Copy link
Contributor Author

wking commented Nov 7, 2016

On Mon, Nov 07, 2016 at 08:14:27AM -0800, Miloslav Trmač wrote:

In short, signing individual layers has dubious value to me…

I think the “I'm compiling an image from single-package layers
that have been signed by their authors” is actually pretty cool
;). I agree that it's not going to be something image consumers
will care about, but it might be something image authors will
care about.

Considering that the only way to refer to layers is by digests,
which are already authenticating them…

Having a digest referencing a layer does not necessarily mean that
either:

a. The layer is named, or
b. The name is trusted.

On the other hand, if you follow a ref to a signed name assertion, and
the blob in that name assertion references the layer (which is one of
the use cases I'm proposing here), you do have a trusted name for
that layer.

… keeping the layer digests in a build system database, and keeping
the layer digests + a set of public key in the same build system
database is not noticeably more secure. Compromising that database
allows an attacker to substitute a layer either way.

Adding public keys to CAS does not affect security (and I'm not
advocating it in this PR anyway ;). “Someone gained access to my CAS
and stuffed it” is a situation we want to anticipate and protect
against. The way you protect against it is by keeping your trust
out of CAS (e.g. your web of trust in ~/.gnupg/trustdb.gpg or your
X.509 CAs in /etc/ssl/certs/). Then if someone stuffs an untrusted
layer or name assertion or sig or public key into your CAS, you'll
know it's not trusted.

Unless you are proposing some kind of repo/name/tagging system for
individual layers which would allow for reasonable end-to-end
verification.

This PR is currently fuzzy about how signatures are stored or
associated with name assertions (those are higher levels I'd like to
address later). With just this PR, you could have DAGs like:

  • naming a layer:
    (name-assertion)
    `-- (layer)
  • naming a manifest-list:
    (name-assertion)
    -- (manifest-list) |-- (manifest 1) | |-- (layer 1.1) | |-- (layer 1.2) |-- (config 1)
    -- (manifest 2) |-- (layer 2.1) |-- (layer 2.2) -- (config 2)
  • etc., etc.

We might setup in-DAG naming (e.g. manifest-list → name assertion →
manifest) in the future, but we'd have to figure out signature
discovery first (would they be in-DAG-too?), so that's out of scope
for this PR.

There are already enough GPG and CMS and whatever signing
implementations and enough data formats around that everyone has
more than enough Lego bricks to built whatever they need.

GnuPG allows you to sign things, but doesn't care about what you're
signing. Name assertions give us a “what you're signing” that
addresses an image-spec use case (“how do I verify that the image I'm
unpacking is the one I asked for?” #22). If folks feel like there is
ever only going to be one thing we assert (names) and only one type of
blob we make that assertion on (manifests?), then yeah,
name-assertions as a separate blob are an overly generic. Just add
“name” to the manifest schema and specify that all signatures on
manifests only cover the name assertion (and not, for example, a
security-audit assertion, or a passed-QA assertion, or whatever).

However, while a number of folks have called for “what should we tell
people to sign?” (e.g. 1 and #400), I have yet to hear anyone making
a case for why the other signing cases I present here (manifest lists,
configs, and layers) are not worth supporting.

… we need every consumer of the spec to interpret the signatures
consistently.

Absolutely, and I think that the current semantics are pretty
clear. A name assertion applies a name to a blob. Image
publishers name whatever they want.

What is a name?

A deep question. That which we call a rose… ;).

It’s just a string. “Applying a string to a blob” has close to zero
semantics.

It's an identity document, and there are lots of other examples of
this sort of thing being useful 2. If you have different language
you prefer, I'm happy to update my descriptions.

“Ah, Alice signed this. She knows what she's doing. Lets open the
blob she signed. A name assertion :). Looks like she thought the
next step in the DAG was a manifest named ‘nginx-1.10.1’. Good,
that matches the ref I asked the ref engine to resolve for me.
Should be smooth sailing from here.”

What about content of any other files which have already been
processed?

They are obviously not covered by the assertion.

What if the ref was using a :latest tag and the signed name does
not have a tag?

Policy call, but personally I'd bail.

What if one of them uses a :500 port number and the other :0500
port number?

I don't see a point to putting port numbers in the names. Where you
get a ref doesn't matter. The caller asks for nginx-1.10.1 and points
you at the ref-engine at localhost:500. You get the ref, walk the
DAG, find the name-assertion (signed by Alice, who you trust), and
Alice names the manifest nginx-1.10.1, so you're good.

What if the host name in the original ref was implied…

I think you're still conflating “where the ref engine lives” and “what
the image is named”.

The media type header means you can't even unmarshal the assertion
with an off-the-shelf JSON parser. That seems like it's rubbing
the unpacker's nose in the type sufficiently ;).

No, that only assures the client that it is processing an
assertion; not an assertion for the expected kind of object and
name
.

After parsing the media type header you know you're processing an
application/vnd.oci.name.assertion.v1. I still don't see how that is
any more or less clear than your
critical.image.docker-manifest-digest.

Your argument here seems to be:

  1. We need distinct signatures for manifests and layers.

No, I don’t think we do need them; but name assertions as an
abstraction only make sense if there are distinct kinds of
signatures, so the attack scenario assumes that.

A separate name-assertion blob assumes either of:

a. There are multiple types of things you'd like to name (this is what
I've tried to motivate with my ‘## Use’ subsections).
b. There are multiple types of things you'd like to assert (e.g. name
assertions, or a security-audit assertion, or a passed-QA
assertion).

  1. The blob.mediaType field in the name assertion is not sufficient to
    distinguish between manifest and layer signatures, because
    implementers can be really sloppy.
  2. critical.image.docker-manifest-digest, on the other hand, is a
    completely acceptable way to distinguish between manifest and layer
    signatures.

I'm fine with (1), but don't buy (2). Respecting a descriptor's
mediaType is fundamental to walking the OCI DAG. Any sane
DAG-walking implementation is going to take it very seriously.

No, so far it has been very easy to ignore it; I know that manifest
lists should contain manifests, and I know that manifests refer to
blobs. Why would I even look at the manifest MIME type?

Manifests refer to layers. What are the semantics of those layers?
Do the referenced layers support .wh.*-style whiteouts (#24)?
Manifests also refer to configs. Does the referenced config a
Docker-like config or a runtime-spec like config (#451)? Assuming you
know the target type without looking at the descriptor's mediaType
seems foolishly risky to me.

That’s really backwards; we want to minimize the amount of code
which needs to be paranoid.

I'm not saying “respecting a blob's mediaType is paranoid”, I'm saying
“everybody that is dealing with blobs should be respecting the blob's
media type”.

The signature verification should happen as early in the process as
possible, with as minimal handling of the contents of the signed
data as possible. A fully generic “sane” DAG-walking implementation
should not be even allowed to touch the data until it has been
cryptographically verified.

I don't think that restrictive policy is going to be universal, so I'd
rather not require it as the one true path. But I'm fine
supporting:

ref name
-- (signature/assertion link) |-- name assertion |-- manifest list…
`-- signature

You still have to somehow get from the ref to the name assertion and
the signature on that assertion. Way you get from the ref name to the
signature needs to be paranoid, to avoid eating a bomb. Once you have
the signature, you can verify the assertion as “signed by someone I
trust not to bomb me” before you parse the assertion. Once it's
signed by someone you trust, you can parse the assertion without much
risk. Once you've parsed the assertion, you can make a policy call
for whether you trust the signer enough to make that particular
assertion and whether you want to continue.

I don't see the content of the assertion blob as a large bomb risk in
this approach, so it's unclear how this impacts a descision between
media type headers in the assertion vs. JSON keys like
critical.image.docker-manifest-digest.

And without (2), I see no reason to go to (3).

If we did go to (3), we'd be pulling in new, non-core descriptor-ish
handling that only applied to assertion handling.

(You’ve suggested elsewhere that signatures should reuse the
descriptor type…

I think assertions should reuse the descriptor type to reference the
blob to which the assertion applies. Signature formats are already
established, so I don't see them using descriptors. The media type
header is a way to patch the assertions media-type into the assertion
because the signature itself doesn't include that information.

The code verifying signatures needs to be minimal and paranoid…

The code verifying the signatures should have already been written (in
GnuPG, or OpenSSL, or whatever). All that will happen on the opaque
assertion byte-stream before anybody attempts to parse the assertion.

The signature verification code really does not want to implicitly
add an urls field and the associated semantics just because the
rest of the distribution system has found it useful and added it
into the descriptor type.

I'm fine having ‘urls’ in the assertion (which is parsed after
signature verification). Why would you care about where you can get
the blob being asserted? How does its presense or absence effect the
security? Wherever you get it (from CAS or a non-CAS URL), you should
be checking that blob against the digest/size that the signed name
assertion claimed.

@stevvooe
Copy link
Contributor

This PR doesn't incorporate a hint of understanding of the issues identified in #400. I'm closing until that conversation concludes with real consensus from maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants