diff --git a/CIP-ledger-evolution/README.md b/CIP-ledger-evolution/README.md new file mode 100644 index 000000000..9117cc55b --- /dev/null +++ b/CIP-ledger-evolution/README.md @@ -0,0 +1,302 @@ +--- +CIP: XXXX +Title: Cardano Ledger Evolution +Authors: Jared Corduan +Comments-Summary: No comments +Comments-URI: +Status: Active +Type: Process +Created: 2022-12-XX +License: CC-BY-4.0 +--- + +# Evolution of the Cardano ledger + +## Abstract + +This CIP proposes a process for proposing changes to the Cardano ledger. + +## Motivation + +The ledger is responsible for processing transactions and updating the shared state of the network. +It also processes block headers and handles the state transformation from one epoch to the next +(e.g. computing the staking rewards). +It is a very user-facing component; +adding new functionality to the Cardano network often involves adding features to the ledger. + +Most of the state maintained by the ledger relates to the +[Extended UTxO accounting model](https://iohk.io/en/research/library/papers/the-extended-utxo-model/) and +support for [Ouroboros](https://iohk.io/en/research/library/papers/ouroboros-a-provably-secure-proof-of-stake-blockchain-protocol/), +the proof-of-stake consensus mechanism used in Cardano. + +This CIP aims to give guidance for future CIPs related to the ledger, +making it a registered category of the CIP process[^1]. + +Many thanks Michael Peyton Jones to writing [CIP-35] and forging this path for us. +Some of the language here has been taken directly from it. + +## Background + +### Terminology + +Context for the terminology used in this document is given in [CIP-59]. + +### Specifications + +The ledger is specified as a state transition system using a +[small step operational semantics](STS.md). +We refer to this framework as the *Small Step Semantics for Cardano*, or the *STS* for short. +An understanding of the existing STS specifications for the +[existing ledger eras](https://github.com/input-output-hk/cardano-ledger#cardano-ledger) +is often required to fully understand the implications of changes to the ledger +(though an understanding of the Haskell implementation is a fair substitute). + +The STS framework leaves both cryptographic primitives and the serialization format abstract. +These specifications need to be complete enough to realize a full implementation of the ledger +given the cryptographic primitives and serialization format. +The cryptographic primitives are described as appendices to the STS specifications, +and the serialization format is given as a +[CDDL file](https://www.rfc-editor.org/rfc/rfc8610). +There SHOULD be one STS specification per ledger era +(the Allegra ledger era is currently combined with the Mary era). + +From the Byron to the Babbage ledger eras, the STS frameworks were written in $\LaTeX$. +Starting in the Conway ledger era, literate Agda will be used. +During the transition from $\LaTeX$ to literate Agda, we will take advantage +of the ability to substitute $\LaTeX$ in place of Agda code when needed for expedience. +With time, the Agda specification will gloriously blur the line between specification +and reference implementation, +just as the type system upon which Agda is built blurs the line between proof and program. + +### Ledger eras + +A ledger era is a collection of features added to the ledger which are introduced at a hard fork. +The existing ledger eras, with a very simplistic description is given below. + +|name|new features| +| --- | --- | +|Byron|initial UTxO ledger| +|Shelley|decentralized block production, stake delegation| +|Allegra|timelock scripts| +|Mary|multi-assets| +|Alonzo|Plutus scripts| +|Babbage|improved Plutus script contexts| +|Conway|constitutional committee replacement, SPO voting on hard forks| + + +### Ledger state + +The ledger state is the source of truth from which the network derives meaning. +For performance reasons, we require it to be the minimal state needed +to derive this value and validate blocks. + +There is currently one exception to the minimality, which will be deprecated in the days ahead, +namely the stake pool ranking information which is used by Daedalus. + +#### Ledger events + +Some provenance about the ledger calculations is provided by ledger events. +These events come with zero cost to a running node if not used and are not stored in the ledger state. + +### Soft forks and Hard forks + +We make the following definitions: + +**Hard fork** - A *hard fork* is a change to the protocol (not restricted to the ledger) +resulting in a single new (CBOR) block being valid. + +**Soft fork** - A *soft fork* is a change to the protocol (not restricted to the ledger) +resulting in fewer blocks being valid. +A good way to think about soft forks is the property that a node which only validates +(and does not produce blocks) does not need a software upgrade in order to continue following the block chain. +Soft forks do not seem to come up often in practice. + +### Serialization + +Transactions and blocks are serialized with +[CBOR](https://www.rfc-editor.org/rfc/rfc7049) +and specified with +[CDDL file](https://www.rfc-editor.org/rfc/rfc8610). + +The ledger state is also currently also serialized with CBOR, +but this may change in the future. + +### Plutus script context + +The ledger and Plutus scripts have a common interface, namely the script context. +See [CIP-35] for more information. +Note that CIPs relating to the script context are relevant to both the ledger and to the Plutus CIP categories. + +## Specification + +This proposal deals only with the types of change listed in +[Types of change](#types-of-change), all others are out of scope. + +### Changes that require a CIP + +This proposal recommends that some of the changes listed in "Types of change" (specified below) should: + +* Be proposed in a CIP. +* Go through additional process in addition to the usual CIP process. + +The additional process mostly takes the form of additional information that should be present in +the CIP before it moves to particular stages. As such, it is up to the CIP Editors to enforce this. + +The requirement to propose a change via a CIP is, as all CIPs are, advisory. +In exceptional circumstances or where swift action is required, we expect that changes may still +be made without following this process. In such circumstances, +a retrospective CIP SHOULD be made after the fact to record the changes and the rationale for them. + +Changes that require a CIP do not have to each be in an individual CIP, they can be included in batches or in other CIPs. +So, for example, a single CIP could propose multiple new related fields in the transaction. + +It is very likely the case that if a change does not require a soft or hard fork, +it does not require a CIP, except in the case of bugs. + +A "bug fix" is a change to behavior where: +- The implemented behavior does not match the specification; or +- The specified behavior is clearly wrong (in the judgment of relevant experts) + +#### Changes that *do not* require a CIP + +* performance improvements +* bug fixes +* ledger events +* changes to the protocol parameter values (this is a governance issue, see [CIP-1694]) + +### Types of change + +- [Ledger features](#ledger-features) + - [Ledger-script interface](#ledger-script-interface) +- [Ledger intra-era switch](#ledger-intra-era-switch) +- [Ledger protocol](#ledger-protocol) +- [Ledger eras](#ledger-eras-1) +- [Serialization](#serialization) + +#### Ledger features + +This is a very broad category. +Any conceptual group of changes to the ledger semantics, +involving a single idea or a group of closely related ideas +(such as adding, modifying, or removing existing features) +is a `ledger-feature`. +This category excludes bug fixes and changes to the [ledger protocol](#ledger-protocol). +Ledger features are assumed to be included in a future ledger era. + +To become active, the ledger feature must be included in an active [ledger era](#ledger-eras-1). + +##### Ledger-script interface + +Any conceptual group of changes to the script context is a `ledger-script-interface`. +This is a shared type with Plutus, so the process in [CIP-35] MUST also be followed. + +To become active, the interface change must be included in an active [ledger era](#ledger-eras-1). + +#### Ledger intra-era switch + +Sometimes a ledger change is so small and targeted that it can be implemented by +switching on the major protocol version (and not introducing a new ledger era). +These must be used *very* sparingly, as they can quickly lead to unintended complexity and +unclear semantics[^2]. +A ledger intra-era switch does not necessarily require a CIP; +sometimes it might be more appropriate to just include it in a [ledger era](#ledger-eras-1) CIP. + +When a hard fork involves only intra-era switching, a new ledger era can be avoided and we +refer to the hard fork as an intra-era hard fork. + +To become active, the ledger intra-era switch must be included in an active [ledger era](#ledger-eras-1). + +#### Ledger protocol + +The `ledger-protocol` category corresponds closely to block header validation. +It corresponds precisely to all the functionality in the Shelley ledger specification +not captured by the `BBODY` and `TICK` transitions. +This mostly involves handling the nonces for Ouroboros, the VRF checks, +the KES certificates, and block leadership checks. + +Historically, the ledger protocols have been specified alongside the ledger eras. +The `TPraos` ledger protocol was specified alongside Shelley, and +`Praos` was specified alongside Babbage. + +To become active, a ledger protocol change MUST be +specified with the STS framework and MUST be included in the software update for a soft/hard fork. +If the fork coincides with a ledger era, the specifications can be combined. + +#### Ledger eras + +A `ledger-era` is a collection of ledger-features, bug fixes, intra-era switches, and other minor changes +grouped together with a name and activated together at a fork. +In the event of interaction between the features, details may need to be provided. + +To become active, the ledger era MUST: +* specify the semantics with the [STS](STS.md) framework +* specify cryptographic details in an appendix to the STS specification +* specify the CBOR serialization schema with a CDDL file +* be included in the software update for a soft/hard fork. + +### Serialization changes + +See [Serialization](Serialization.md). + +### The Small Step Semantics Framework for Cardano + +See [STS](STS.md). + +### Implementing and releasing changes + +This CIP does not cover the process of implementing changes. +As usual, the CIP process covers the design phase, and it is up to the implementer to ensure that +their proposal is implemented, which may require additional work to meet the requirements of the +maintainers of the Cardano code repositories (testing, implementation quality, approach), and so on. + +Changes can be released after their CIPs have reached Active status. +Different changes will require different releases as described in "Types of release". +This CIP does not cover the process by which changes are actually incorporated into releases after having been implemented. +In particular, there is NO assumption that a feature which requires a particular release will be +included in the next such release, even after it has been implemented. + +### Ledger Core CIP registry + +Any CIP which proposes a type of change listed in "Types of change" MUST also add itself to this registry (in addition to the main registry). + +| # | Title | Type of change | Status | +|----|-------------------|-------------------------|--------| +| 31 | Reference inputs | Ledger-script interface | Draft | +| 32 | Inline datums | Ledger-script interface | Draft | +| 33 | Reference scripts | Ledger-script interface | Draft | + +## Rationale + +### Why have a public process for changes? + +Cardano is continuing to move towards decentralized governance within the Voltaire phase of development. +Historically, key development and implementation decisions have been made by the core development team. +This was important in the earliest stages of the platform’s evolution. +However, this becomes less so as the platform starts to mature and is neither sustainable nor desirable in the long term. + +Furthermore, while many changes to Cardano are obscure or not of interest to many community members, +there is a much larger community who have a keen interest in changes to Plutus Core: dApp developers. +Hence it is especially important to have a clear way for this community to be able to propose changes and see how they are progressing. + +### Why include a CIP registry? + +This is just to make it easy for those considering proposing a CIP following this process to see which CIPs have already been submitted. +An alternative would be a standard title for CIPs, or perhaps some kind of CIP metadata to indicate that it follows the process in this CIP. + +### Out of scope + +* Changes that cover _using_ the ledger, but not _changing_ the ledger. +* Changes to the node-to-client queries, which are used to get information from the ledger. +* Changes about how clients consume the ledger state, unless it involves changes to the ledger state. +* Unless something exceptional arises, + storing additional information in the ledger state that is not needed for block validation. + +[^1]: See [CIP-1](https://github.com/cardano-foundation/CIPs/blob/cip-cps-rework/CIP-0001/README.md#categories). +[^2]: On an early testnet for the Mary era, the network managed to enter the Mary era while remaining +on major protocol version 2 (the mainnet version for the Shelley era). +The network exhibited strange behavior due to this fact, which was perplexing to understand until +we realized the mismatch. We dubbed it the Mary Shelley's Frankenstein era. + +[CIP-35]: https://github.com/cardano-foundation/CIPs/tree/master/CIP-0035 +[CIP-59]: https://github.com/cardano-foundation/CIPs/tree/master/CIP-0059 +[CIP-1694]: https://github.com/cardano-foundation/CIPs/tree/master/CIP-1694 diff --git a/CIP-ledger-evolution/STS.md b/CIP-ledger-evolution/STS.md new file mode 100644 index 000000000..65cde79bc --- /dev/null +++ b/CIP-ledger-evolution/STS.md @@ -0,0 +1,195 @@ +# The Small Step Semantics Framework for Cardano + +This document is heavily based on +an [original version](https://github.com/input-output-hk/cardano-ledger/blob/master/docs/small-step-semantics/small-step-semantics.tex) +by [Nicholas Clarke](https://github.com/nc6). + +## Introduction + +In defining the Cardano ledger, we are concerned with the means to construct inductive datatypes +satisfying some validity conditions. For example, we wish to consider when a sequence of transactions +forms a valid ledger, or a sequence of blocks forms a valid chain. +This document describes the methods we use to describe such validity conditions and how +they result in the construction of valid states. + +## Transition Systems + +In describing a state transition system `L` we are principally concerned with six things: + +* **States** - The underlying datatype of our system. + +* **Transitions** - The means by which we might move from one (valid) state to another. + +* **Signals** - The means by which transitions are triggered. + +* **Predicates** - We do not prescribe a formal language with which to define predicates, +but rather leave this up to the description of specific systems. +Even so, a formal language may not been needed as long as the notation is clear and precise enough. +Typically, finite set theory is all that is needed, with notation for concepts such as +integers, sequences, powersets, functions, maps, comprehensions, conditionals, etc. +Most of the predicates will have a value of true or false, +but we will stretch the rules later when we define the composition of transition systems. + +* **Rules** - Judgments describing valid transitions. A rule has an antecedent +(sometimes called premise) and a consequent (sometimes called conclusion), such that if +the predicates in the antecedent hold, the consequent is assumed to be a valid state or transition. +The predicates in the antecedent may contain variables for the state, the signal, +and the environment (see below). + +* **Environment** - Sometimes we may implicitly rely on some additional information being present +in the system to evaluate the rules. An environment does not have to exist (or, equivalently, +we may have an empty environment), but crucially an environment is not modified by +transitions. + +**Definition (State transition system)** + +A state transition system `L` is given by a 5-tuple `(S, T, Σ, R, Γ)` +where: +* `S` is a set of (not necessarily valid) states. +* `Σ` is a set of signals. +* `Γ` is a set of environment values. +* `T` is a set of (not necessarily valid) transitions. We have that `T ⊆ 𝒫 (Γ × S × Σ × S)`. +* `R` is a set of derivation rules. + A derivation rule is given by two parts: + * the antecedent, a predicate allowing values from `S ∪ Σ ∪ Γ` + * the consequent, a transition `t ∈ T`. +* For every `s₀ ∈ S`, `σ ∈ S`, `γ ∈ Γ`, `r ∈ R`, if the antecedent of `r` + holds for `γ, s₀, σ`, then there is exactly one `s ∈ S` such that + `(γ, s₀, σ, s) ∈ T`. + +We refer to a 4-tuple `(γ, s₀, σ, s) ∈ T` as as **valid transition** if there is a rule `r ∈ R` +such that the antecedent of `r` holds for `γ, s₀, σ`. + +## Notation + +We depict a derivation rule `r ∈ R` as follows: +* There is a vertical line, with the antecedent above the line and the consequent below. +* The consequent is depicted as follows, from left to right: + * The environment (if non-empty) + * The turnstile symbol, `⊢` + * The initial state (denoted `s₀` above) + * A right arrow with signal above it + * The new state (denoted `s` above) + +### Notational conveniences + +#### Multiple predicates + +Rather than a single logical predicate in the antecedent, we allow multiple predicates. +All predicates defined in the antecedent must be true; that is, +the antecedent is treated as the conjunction of all predicates defined there. + +#### Let bindings + +The antecedent may contain variable definitions of the form `z := f (s)`, +where `z` is an additional binding to be introduced. +We refer to these as "let-bindings" in analogy with their use in programming languages. +Bindings introduced in the antecedent can be used in the consequent. + +### Notation Example 1 + +``` +P(A) z := f(s) B := g(A, E, z) + +----------------------------------- + + s +E ⊢ A ⟶ B +``` + +The example antecedent is made up of three items: +* a predicate `P(A)` +* a let binding `z := f(s)` +* a let binding `B := g(A, E, z)` + +The environment is `E`. +The initial state is `A`. +The signal is `s`. +The new state is `B`. + +Note that in this example, +`(E, A, s, B)` is a valid transition exactly when `P(A)` is true. + +## Composition of Transition Systems + +Part of the benefit of this approach is that it allows us to define systems which themselves rely +on other state transition systems. +To do this, we allow for an additional construction in the antecedent. + +The antecedent may contain a transition, which we call a sub-transition. +If the transition is valid, we treat it like a true predicate. +Otherwise, if the transition is not valid, we treat it like a false predicate. +The final state of the sub-transition acts as a let binding for further computation. + +To distinguish transition systems, we write a name below the arrow in the consequent. + +### Notation Example 2 + +``` + z +P(A) z := f(s) ⊢ g(A) ⟶ B C := h(B) + S1 +------------------------------------------- + + s +E ⊢ A ⟶ C + S2 +``` + +In this example, `(E, A, s, C)` is a valid `S2`-transition exactly when both `P(A)` is true +and `(∅, g(A), z, B)` is a valid `S1`-transition. + +## Naïve UTxO Example + +Imagine a very naïve UTxO ledger where: +* a **coin** is a natural number +* a **transaction input** is a pair, consisting of a transaction ID and an index + (corresponding to the index of the output that created it) +* a **transition output** is a single coin +* a **fee** is a single coin +* a **transaction** consists of: + * a set of transaction inputs + * a sequence of transaction outputs, writing `idx ↦ out` if `out` has index `idx` in the sequence. + * a fee +* the **UTxO** is a map from transaction inputs to transaction outputs +* each transaction has a unique ID, given by a function `txid`. +* each transaction has the following accessor functions: `txins`, `txouts`, `txfee`. +* the function `outs` on transactions is defined by + `outs tx = { (txid tx, idx) ↦ out | idx ↦ out ∈ txouts tx }`. + +For any given transaction `tx`, the naive ledger could: +* check that the fee is at least as big as some predetermined, minimal value +* check that the transaction inputs are legitimate, i.e. they are in the domain of the UTxO +* check that `tx` preserves the number of coins in the system, + meaning that the sum of the coins associated with the transaction inputs in the UTxO + is equal to the sum of the transaction outputs plus the fee. + +In the case that the transaction meets all the above checks, +the ledger would then destroy the unspent transaction inputs, +create new unspent transaction inputs corresponding to the transaction outputs, +and augment the fee pot. + +We could implement this ledger with the following STS transition: + +``` +f := txfee tx minFee ≤ f + +txins tx ⊆ domain utxo + +balance (txins tx ◁ utxo) = balance (outs tx) + f + +--------------------------------------------------------- + +minFee ⊢ ⎛ utxo ⎞ tx ⎛ (txins txb ⋪ utxo) ∪ outs txb ⎞ + ⎝ fees ⎠ ⟶ ⎝ fees + txfee txb ⎠ +``` + +where: + * `balance` is the appropriate sum of coins + * `S ◁ M = { k ↦ v | k ↦ v ∈ M, k ∈ S }` + * `S ⋪ M = { k ↦ v | k ↦ v ∈ M, k ∉ S }` + +Note that: +* the state associated with this transition is the UtxO set and the fee pot. +* the environment associated with this transition is `minFee`. +* the signal associated with this transition is `tx`. diff --git a/CIP-ledger-evolution/Serialization.md b/CIP-ledger-evolution/Serialization.md new file mode 100644 index 000000000..80f6fdec7 --- /dev/null +++ b/CIP-ledger-evolution/Serialization.md @@ -0,0 +1,97 @@ +# Serialization in the ledger + +This document specifies a policy for the backwards compatibility of the serialization scheme of +Cardano transactions. + +## Motivation + +Transactions on Cardano are sent on the wire using CBOR and are specified with CDDL. +The first scheme was introduced with the Byron phase. +This scheme was changed dramatically with the introduction of the Shelley phase. +As of the time of the writing of this CIP, however, every new scheme has been backwards +compatible with the original scheme from the Shelley phase. +The intention is still to maintain backwards compatibility to the extent reasonable, +and to make explicit our policy for breaking backwards compatibility when deemed necessary. + +## Specification + +Problems with serialization fall into two categories: +* flaws in the implementation +* flaws is the CDDL specification + +Note that at the time of the writing of this CIP, there is only one implementation of the Cardano +node, and we do not yet need to consider inconsistencies between different implementations. + +The policy for maintaining backwards compatibility with the transaction serialization will be +as follows. + +### Serious Flaws + +A **serious flaw** in the serialization is an issue which could have a large and negative impact +on the network, and which requires a hard fork to fix. +These will almost always be problems with the serialization and not the specification. +It is up to human discretion to determine what constitutes a serious flaw, +mostly likely by the core developers. + +Backwards compatibility can be abandoned in the case of a serious flaw, +and **the fix will occur at the next available hard fork**. + +### Non-Serious Flaws + +A **non-serious flaw** in the serialization is an issue which is not safety critical, +but is problematic enough to merit breaking backwards compatibility and requires a +hard fork to fix. +This is again a human judgment. + +Backwards compatibility can be abandoned in the case of a non-serious flaw, +but there must be a deprecation cycle: +* A new format can be introduced at a hard fork, but the old format must be supported for at + least **six months**. After six months, the old format can be abandoned at the next possible + hard fork. + +#### Example + +A good example of a non-serious flaw is the CDDL specification of the transaction output in the +Alonzo ledger era: + +``` +alonzo_transaction_output = [ address, amount : value, ? datum_hash : $hash32 ] +``` + +There is nothing inherently wrong with this scheme, but it caused a problem in the Babbage ledger +era with the addition of inline datums and script references. +In particular, there were two new optional fields, and there was mutual exclusivity. +In order to maintain backwards compatibility, Babbage introduced this scheme: + +``` +transaction_output = alonzo_transaction_output / babbage_transaction_output + +babbage_transaction_output = + { 0 : address + , 1 : value + , ? 2 : [ 0, $hash32 // 1, data ] + , ? 3 : script_ref + } +``` + +In other words, a new format was created, but the legacy format was still supported. +The new format, `babbage_transaction_output`, was introduced 2022-09-22 with the Vasil hard fork, +The old format, `alonzo_transaction_output`, can be retired after 2023-03-22. + +### Summary + +* We should strive to maintain backwards compatibility. +* Serious flaws can be fixed immediately (at the next hard fork), and can break backwards + compatibility. +* Non-Serious flaws can be fixed (at the next hard fork), but the old format + must be supported for at least six months with support ending at the next hard fork event after + the six months have passed. + +## Rationale + +It seems clear that security issues merit breaking backwards compatibility and should be fixed +as soon as possible. +The six month compatibility window for non-serious flaws is mostly +arbitrary, but we need to allow enough time for people to migrate. +It would be great to have more explicit definitions for "serious" and "non-serious" flaws, +but this seems very difficult.