Spanning Layer Protocol Stack Headers #20
Replies: 1 comment 1 reply
-
I like all of this and think we should use it. It allows the base protocol to carry any amount of complex and layered header information for thin layers above it, which is awesome. CESR goodness! However, it doesn't address the discrepancy of our visions at its core, because it requires that rich information to be opaque to the base protocol. I think most stuff should be opaque, and I can give numerous examples of headers that are specific to particular subsets of trust tasks -- and therefore optional and opaque. But I think a small set of sender intentions cannot be opaque in anything that calls itself the trust spanning layer, because it forces into "undefined" status several vital issues that should NEVER be undefined if the goal is trusted communication. If, on the other hand, the goal of the TSP is just authentic data, that's a whole different ballgame. This question about the goal is the reason I started #22 . |
Beta Was this translation helpful? Give feedback.
-
This is a concrete example as a notional proposal for how to provide an interoperable stack or tree of protocols that depend on the spanning layer. It is based on the KERI/ACDC protocol stack.
Spanning Layer Protocol Stack Headers
Thin protocols in a protocol Tree
A spanning layer protocol has a unique topology. It is not a stack so much as a tree of thin layers with two or more protocol branches above the layer below except the spanning layer. The spanning layer is special. It is the inflection point in the stack. The meaning of an inflection point in this context is an information encapsulation boundary that lies between the application layers above the spanning layer and the support layers below. I am using the narrow definition of application used in the Beck hourglass paper where anything in a layer above a given layer is an application with respect to that given layer. Not just end user applications. And anything below a given layer is a support with respect to that given layer.
We can model this topology more accurately as a tree where the spanning layer is the trunk, the supporting layers are the roots, and the application layers are the branches. The path along any application branch forms an independent stack of protocols. That stack is largely not interdependent on information in sibling branches. This enables a thinner protocol at each layer in an application branch's protocol stack (many thin layers). This branching protocol topology is antithetical to the OSI thick layer (end-to-end) protocol architecture and can be jarring to someone unfamiliar with spanning-layer-based protocol design.
Each protocol/layer in a branch is best modeled as a functional layer that provides a set of features, not necessarily just a rigid data structure or header. The features are largely additive going up any one of the application protocol branches and largely subtractive going down any one of the support protocol roots. This means that for layers at or above the spanning layer, the data needed for any layer's function (feature set) should be included in that layer, plus potentially information from layers below it in the same branch but not below the spanning layer and not in some other branch. Layers above the spanning layer in an application can see down the stack along its branch but layers below a given application layer can not see up the stack along its branch and certainly not along any other branch. More specifically, the information needed to support features unique to a given application layer at or above the spanning layer should, by in large, be included in that given layer's information in an additive fashion to the information below it in its own branch.
Layers below the spanning layer (i.e., the support of the spanning layer) should be opaque to layers at or above the spanning layer. This enables layers above the spanning layer to ignore the discovery, routing, and transport that occurs within the supporting protocol layers. The configuration of these supporting layers happens out-of-band relative to the spanning layer and above. This is why the spanning layer is an inflection point. The support wants to be as broad as possible to hasten adoption but without encumbering the protocols/applications that sit above the spanning layer with the need to manage in-band the complexities of the supporting protocols.
Payload and Header
Typically, information specific to a layer is included in a layer-specific header and a layer-specific payload. The payload is opaque to the function of the layer. This encapsulates information conveyed, nested, or tunneling through a given layer by layers above that given layer that may appear to that given layer as its payload.
Historically (circa IP), data in a layered protocol like IP consisted of C structs that provided self-framing headers. One of the fields in the C struct was the length of the header plus payload. Another field that may be included in a header is the type of the layer above as a hint to a parser, or the type may be included instead in a well-known location in the header of the layer above. Stacking layers meant nesting a layer-specific self-framing header. Upper layers nest inside lower layers. This allows an upper layer to tunnel through a lower layer. The header and payload of the next layer up is the payload of the next layer down. This extends recursively up a given branch of the layer tree. By branching with tunneling down each branch, each independent protocol was enabled to be thinner than if multiple independent protocols has been merged into a single OSI-like heavy layer.
Agile Cryptographic Primitives
What is unique about a Trust Protocol stack is that cryptographic primitives may need to appear not merely in payloads and attachments but in headers, i.e., everywhere. Therefore an agile crypto-primitive encoding mechanism becomes an extremely important part of the protocol architecture. To elaborate, cryptographic primitives are difficult to encode because they need to be sufficiently cryptographically agile. Sufficient crypto agility needs to be baked into a given version of the protocol. In other words, crypto agility should not be dependent on protocol versioning because when a newly discovered weakness or exploit or attack arises with respect to a given cryptographic algorithm, primitives encoded using that algorithm can't wait for a new version of the protocol to be created in order to mitigate that weakness/exploit/attack via agility.
So unlike any other protocol stack, a trust protocol has the need for in-version crypto agility, which in turn places special demands on the cryptographic encoding in order to support that agility. For example, NIST has selected finalists for post-quantum safe digital signing algorithms. These have been implemented in the open quantum-safe library and will achieve certified production status soon. The current version of a given protocol could support one or more of these post-quantum-safe signature algorithms such that users of that protocol can immediately switch to post-quantum-safe signatures whenever a suspected quantum vulnerability is likely without requiring the protocol version itself to be upgraded.
There are two popular ways to support in-version crypto agility. The most common is to make each cryptographic primitive a data structure. The second is to use an inline encoded primitive that appears as a single string.
Primitive as Data Structure
When using the data structure approach, the data structure includes in one of its fields the raw value of the primitive, this is either the binary value itself or a text encoding (such as Base64) of that binary. The other fields may include information about the type of algorithm and any algorithmic parameters. An example would be ECDSAsecp256k1. Where ECDSA refers to the type, sec p refers to the size of the space, 256 is the length in bits of the keys, and k1 refers to the nonce. Several other parameters are needed to completely define the ECDSA algorithm. But given a table of ECDSA variants, secp256k1 is enough to identify a given variant.
The main drawback of using a data structure to define a primitive is that it makes any use of that primitive extremely verbose, especially in layered protocol headers. The second drawback is that it may expose the user to more flexibility than should be allowed or supported by a given protocol. Too much flexibility can lead to misconfigured algorithms where a weak combination of parameters may be induced as an attack vector.
The best practice is to severely limit the exact cryptographic algorithms supported while including enough choices to provide sufficient crypto agility to mitigate weaknesses. This enables the algorithms to be defined in a table, each with a single label to reference all the parameters of that algorithm.
Nonetheless, a table lookup for the algorithm type still requires a data structure to define a primitive because, at the very least, both the algorithm type and the raw value must be included in a structure in order to apply the primitive. For example, the JWS (Jason Web Signature) specification defines a signature primitive as a data structure. The structure has a header that is itself a JSON data structure that may use some combination of 11 defined fields. Not all fields need to be in any JWS header, and some field combinations are mutually exclusive, but the number of fields is usually more than one. The JWS structure is rigid because a properly constructed JWS structure includes not just the header but also the signed payload and the signature. The header may include the verification key for that signature. If we count the verification key as a primitive and the signature as a primitive, then a JWS actually provides two associated primitives in a complex structure (more complex than two strings). This makes a JWS inflexible and verbose when combined with other cryptographic primitives, such as a hash, a set of hashes, a set of signatures, a set of keys, or some combination thereof. This may become painful when using multiple cryptographic primitives together in a verifiable data structure, such as a hash-chained key event log (KERI proof of key state).
Primitive as an encoded text string.
This is the most compact and usable approach. A text string can be used in any text-based document or data format, such as JSON, XML, or plain text, as well as text-based namespace, such as a URL. The encoding needs to do several things. It needs to define the crypto algorithm, it needs to include the raw primitive itself in text-encoded format, and it needs to define the length of the primitive so that any parser can delimit it. Another way of saying this is that the encoding format needs to be self-describing and self-framing. This means that in every case, a primitive may appear as a text string in some other data structure or document. This makes the readability of complex verifiable data structures about as readable as one can get, given that cryptographic primitives are going to long strings of seemingly random characters. If the encoding is prefixed to the encoded raw value and that prefix is stable for any given primitive type, then the readability becomes discernable to anyone familiar with a given type of primitive. This enables text-based tooling and annotation to be leveraged for debugging.
Notwithstanding the usability and terseness advantages of a text-string encoded representation when compared to a data structure, especially a labeled field map data structure, string encodings of cryptographic primitives are more verbose than a binary representation. What if there was a way to convert en-masse any text-encoded primitive or any concatenation of text-encoded primitives, or better yet, nested compositions of groups of text-encoded primitives to/from binary without violating primitive boundaries? So we get a more compact binary representation for over-the-wire transmission but readable text for development, debugging, test, archival, and embedding in text documents. Well, that is what CESR as an encoding standard provides. Leveraging CESR greatly simplifies the expression and representation of trust protocol packets, including headers, payloads, and attachments.
CESR Encoding in Trust Spanning Layer Protocols
Given that CESR can conveniently represent cryptographic primitives in either the text or binary domains, means that not only are fixed data structure compositions supported in both text and binary but other more flexible serializations like JSON text or CBOR binary or MGPK binary as well. CESR's grouping composition means that CESR can provide not merely self-framing primitives inside headers, payloads, and attachments but self-framing groups of primitives as headers, payloads, and attachments themselves or all three together as a fully explicated signed message.
CESR is further generalized to not just encode strictly cryptographic primitives such as keys and hashes but other standard data types useful in a cryptographic trust protocol like numbers, dates, strings, lists, lists of tuples (ordered maps) etc.
Given such flexibility and expressive power, we can generalize header data as an "effective" header. The term effective header is used to capture the set of information included in the associate protocol layer that is specific to the function of the layer.
Payloads should always be effectively opaque, even if the serialization exposes the information in the payload. Effectively opaque means the information in the payload is not used by the layer function other than to convey the payload. To elaborate, the layer functionality does not depend on accessing any information inside the payload. The payload of a given layer is an opaque set of data, usually in an atomically serializable form to support cryptographic operations on that payload, such as digest and signatures.
To summarize, with flexible serializations and composition, each layer above the spanning layer has its own layer-specific "effective" header and payload that is included inside its given serialization type as allowed by the encoding. Header information can be included in ways that are not strictly an atomic header structure.
IP example
The important fields for this discussion in the IP Header are as follows (some field ommitted for clarity)
The version field provides the version of IP such as 4 for IPv4. The HLen fields provides the length of the header in quadlets (4 byte words). The TLen fields provides the length of the total IP packet (header plus payload) in quadlets. These make an IP header and packet self framing.
The Protocol field is a code to indicate the type of protocol embedded in the payload. Standard protocols have a defined code in the IANA registry (https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml). There are about 144 of the 256 possible already assigned. This means that the IP spanning layer splits into 144 different protocol branches above it. Although, the most used ones are UDP, TCP, and ICMP. These protocols are highly independent of each other and were layered on top organically over time. This organic growth is in contrast to an much more rigid OSI design which totally define the feature set of a layer at the outset.
The Source Address field provides the IP address of the source of the packet. The Destination Address provides the IP address of the destination of the packet. The destination address helps with routing and discovery. Because IP has no security mechanism, the source address is not very useful because any intermediary can spoof it (i.e. replace it with a fake source address).
If we only show the two main branches UDP and TCP, the IP protocol tree at its trunk appears as follows:
We can ignore the other 100 plus branches because their features are largely additively mutually independent of each other including UDP and TCP. They all share the IP header so that routing and discovery works in the support and for the routing protocol stacks.
CESR/KERI/ACDC Example
The KERI protocol is an example of trust spanning layer protocol. Unlike the IP layer, Keri is not simply a self-framing header its a function. The function is to manage the controlling keys of an identifier is such a way that a verifiable proof of key state can be provided to any verifier. Essentially the proof of key state provides a mapping between an identifier and the current controlling key state. Management of this mapping requires several packet types. It identifier is a cryptonym (cryptographic pseudonym) called an AID (Autonomic Identifier). Autonomic means self-governing or self-managing. In this sense, autonomic refers to self-managing controlling key state for the identifier. In KERI a necessary feature of a self-managing identifier is that it is self-certifying where the identifier is derived from one or more public keys.
Because the KERI protocol is about supporting self-managing AIDs in general, there may be more than one occurence of a KERI AID in any packet. Consequently, unlike IP, there is no fixed header for protocols that rely on KERI as a spanning layer. Compared to simple header based layers like IP, KERI feels like a virtual protocol layer. Any packet with one or more AIDs can leverage the KERI protocol to provide proof of controlling key state. Usually the proof is needed to verify the signature or signatures of the controlling keys for the associated AIDs. The role a given AID and its associated signature provides in a given supported protocol is supported protocol dependent. But all supported protocols prove key state using the the KERI protocol for each AID.
Interleaved Streaming
The design of the KERI protocol that any supported protocol can leverage is that all packets can be interleaved in a stream. This means that both datagram based (such as UDP and HTTP) and stream based (such as TCP) classes of IP protocols can support KERI and its supported protocols. This was intentional to maximize the size of the support for the KERI as a spanning layer. Serialization and parsing for generic streaming packet support is provided by the CESR protocol as described above. CESR is a dependency that enables interfacing the functional nature of the KERI to all the supporting protocols which expect a more conventional header based nesting of information. This is a significant protocol innovation not merely the crypto agility via self-framing composable primitives.
Notional Packet Format
Standard Header Elements
Functionally the following fields or field elements are supported by CESR for all message packets. How the elements appear in a given serialization may not be as a distinct field for each element. But functionally, the information may be extracted as the same set of distinct elements which set is common to all packets. These elements can be considered the required standard header. Different packet types may have additional fields or elements.
The protocol element defines the standard protocol types as a four-character string drawn from the Base64 character subset of text characters. These will be governed by a registry to protect against name collisions. This field is analogous to the protocol field in the IP packet header. Currently, there are two defined protocol types,
KERI
andACDC
The version element provides a version for each protocol. Currently, the version is a two-character hex string with one character for the major version number and the other for the minor version number. This field is analogous to the version field in the IP packet header.
The serialization element provides the type of serialization encoding. It is a four-character string drawn from the BASE64 character subset of text characters. There are only four serializations supported, these are
JSON
,CBOR
,MGPK
, andCESR
(upcoming).A message packet may be encoded in any one of these four serializations. These include the most popular serializations in use today. This was done intentionally to broaden the support (including tooling) for the KERI spanning layer.
The length element provides the length in bytes (characters) for the total packet, including any header information. This enables a parser to extract by counting bytes a complete message from a stream without having to parse the fields or elements of the message. This makes the message self-framing so it may be pipelined.
The type element provides the message type as a three-character string drawn from the Base64 subset of characters. This namespaces the message types within a given protocol type. The message type may be used to define the field configuration of a given message which is essential for unlabeled fields in a fixed field message in serializations like CESR that support fixed field messages. The message type can by itself convey the intent of the sender, and the purpose of the packet.
A particular problem that KERI faced in supporting multiple serialization types is how to enable self-framing in general so that a stream parser can pipeline messages. This is particularly difficult for block-delimited labeled field maps like JSON. It is also a problem for CBOR and MGPK, which are also block framed with prefixed block framing codes and counts instead of encapsulating block (start and end) characters. In any case, a stream parser must parse each block in JSON, CBOR, and MGPK element by element and nested block by nested block in order to determine the length of the serialization. This means that such serializations by default cannot be pipelined.
The way KERI fixes up these block delimited or block framed serializations to enable pipelining is that the protocol, version, serialization, and length elements are combined into a single string called the version string. The version string field must appear as the first field in the top-level block. A simple regex parser can then uniquely detect the version string value and extract each of its element from any one of the block delimited/framed serializations. This is not necessary for CESR, which uses pipelining-friendly block framing codes and counts. The main reason the protocol type and serialization type are each four characters long is to provide sufficient uniqueness so that the regex parser can unambiguously detect the version string. This is a trade-off to enable broader support at the cost of a handful of characters. CESR does not have to make this trade-off and is the most compact serialization for those applications where bandwidth performance is most important.
Other Message Fields
Typically a message that is signed includes a digest field whose value is the SAID (Self-Addressing IDentifier) of the message (including the header elements). The SAID protocol defines how to compute and embed the SAID field value. The SAID is CESR encoded so the digest algorithm has crypt agility.
The digest field enables hash chaining of messages in cryptographically verifiable data structures. In also enables verifiable message integrity in an an interoperable and crypto agile way.
If the message is to be signed, it should have an Identifier field that provides the AID of the signer. The AID is also CESR encoded so that is has crypto agility relative to its self-certifying derivation. This makes the packet minimally authenticatable to its source AID.
A packet may have multiple AIDs for other purposes including a destination when the protocol and packet type need it.
Besides the digest and AIDs, other fields depend on the function of a given message type for a given protocol type. This fields may include other CESR encoded primitives like public keys, digests of other messages or other data items etc.
But this header structure should be sufficiently expressive that all protocols in the protocol key that depend on the spanning layer protocol can be expressed in an interoperable manner so that a given stream parser can parse interleaved message packets from all the message types of all the supported protocols and message types in a single stream. This maximally broadens the support.
Notional Extension for Other Features
Proposed additions to the CESR code tables would support variable length encrypted primitives for both symmetric and asymmetric encryption algorithm types. These means that any message can embed encrypted content either as a field or fields, a list of encrypted values, a map of encrypted values or as a field whose value is an encrypted block.
Proposed CESR additions, for example, would be a composed self-framing CESR group that includes a public encryption key followed by the variable length encrypted primitive. This can be further composed into a group that effectively provides a list of such groups.
A similar structure could be provided that prefixes an identifier for a given shared symmetric encryption key with a variable length symmetrically encrypted primitive along with compositions of such groups.
Thus protocols that want to include confidential information via encrypted primitives can do so by leveraging only CESR.
Beta Was this translation helpful? Give feedback.
All reactions